Introduction
The recent surge in the availability of online videos has changed the way of acquiring information and knowledge. Many people prefer instructional videos to teach or learn how to accomplish a particular task in an effective and efficient manner with a series of step-by-step procedures. This need is not only limited to general audiences. In professional settings such as healthcare, instructional videos are widely used by physicians and other professionals to learn, review, and standardize procedural workflows. In addition, consumers increasingly seek step-by-step visual explanations to better understand medical procedures and clinical practices. With the advancement of generative models, the medical domain has also seen progress in medical video understanding, clinical decsion support etc,. Toward this, the MedGenVidQA shared task focuses on developing systems that utilize generative models to retrieve relevant multimodal (textual and visual) sources and to localize visual answers within medical videos in response to consumer and healthcare professional medical queries. Additionally, resource creation in the medical domain is both costly and time-consuming, as it often requires the involvement of medical experts. In this context, we also aim to assess the capability of generative models to create question–answer pairs from medical videos. Following earlier editions of medical question answering tasks: MedVidQA 2023, MedVidQA 2024, BioGen 2024, and BioGen 2025, this shared task expands medical video question answering for both professionals and consumers, with a focus on generative approaches to solving these tasks.
News
- January 16, 2026: Training and Validation datasets released.
- January 16, 2026: Introducing the MedGenVidQA 2026 challenge.
Important Dates
| Corpus Release |
Training/Val Set Release |
Test Set Release |
Submission Deadline |
Official Results |
Paper Submission Deadline |
|
|---|---|---|---|---|---|---|
| Task A | January 30 | January 16 | February 16 | March 15 | April 10 | April 24 |
| Task B | January 30 | January 16 | February 16 | March 31 | April 10 | April 24 |
| Task C | January 30 | January 16 | February 16 | March 31 | April 10 | April 24 |
Join our Google Group for important updates! If you have any questions, ask in our Google Group or email us.
Registration and Submission
- Registration and Submission will be done via CodaBench (Link will be added soon)
- Participants should submit system outputs through the corresponding CodaBench competition for each subtask. Teams may participate in any subset of the subtasks.
- Each team is allowed up to ten successful submissions per subtask on CodaBench.
- Teams should designate their best-performing submission for each subtask by pushing it to the leaderboard.
Paper Submission
- All shared task participants are invited to submit a paper describing their systems to the Proceedings of the BioNLP 2026 at ACL 2026.
- Paper must follow the submission instructions of the BioNLP 2026 workshop.
Tasks
Task A: Multimodal Retrieval (MMR)
Given a medical query and a collection of multimodal sources (textual and video), the task aims to retrieve the relevant video and PubMed articles from the video and PubMed collection which contain the answer to the medical query.
Datasets
Training and Validation Datasets:
MedVidQA collections [1] consisting of 3,010 human-annotated instructional questions and visual answers from 900 health-related videos.
Download Dataset
MedAESQA collections [2] consisting of 8,427 human annotated PubMed documents against the answer to the consumer health questions.
Download Dataset
Test Dataset:
Will be released via CodaBench.
Evaluations
We will evalaute the performance of the video and text retrieval system in terms of Mean Average Precision (MAP), Recall@k, Precision@k, and nDCG metrics with k={5, 10}. We will follow the trec_eval evaluation library.
Run Submission
CodaBench (TBA)
Task B: Visual Answer Localization (VAL)
Given a medical query and a video, the task aims to locate the temporal segments (start and end timestamps) in the video where the answer to the medical query is being shown, or the explanation is illustrated in the video.
Datasets
Training and Validation Datasets:
MedVidQA collections [1] consisting of 3,010 human-annotated instructional questions and visual answers from 900 health-related videos.
Download Dataset
HealthVidQA collections [3] consisting of 76K automatically generated instructional questions and visual answers from 16K health-related videos.
Download Datasets
Test Dataset:
Will be released via CodaBench.
Evaluations
Following MedVidQA[1], we will use Mean Intersection over Union (mIoU) and IoU =0.3, IoU=0.5 and IoU=0.7 as the evaluation metrics.
Run Submission
CodaBench (TBA)
Task C: Question-Answer Generation (QAG)
Given a medical video, the task aims to generate all instructional questions and corresponding visual answer in terms of start and end timestamps in the video.
Datasets
Training and Validation Datasets:
HealthVidQA collections [3] consisting of 76K automatically generated instructional questions and visual answers from 16K health-related videos.
Download Datasets
Test Dataset:
Will be released via CodaBench.
Evaluations
TBA
Run Submission
CodaBench (TBA)
Organizers
References
- Deepak Gupta, Kush Attal, and Dina Demner-Fushman. A Dataset for Medical Instructional Video Classification and Question Answering, Sci Data 10, 158 (2023).
- Deepak Gupta, Davis Bartels, and Dina Demner-Fushman. A Dataset of Medical Questions Paired with automatically Generated answers and Evidence-supported References. Sci Data, 12.1 (2025): 1035.
- Deepak Gupta, Kush Attal, and Dina Demner-Fushman. Towards answering health-related questions from medical videos: Datasets and approaches. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16399-16411, 2024.