MedGenVidQA 2026

Introduction

The recent surge in the availability of online videos has changed the way of acquiring information and knowledge. Many people prefer instructional videos to teach or learn how to accomplish a particular task in an effective and efficient manner with a series of step-by-step procedures. This need is not only limited to general audiences. In professional settings such as healthcare, instructional videos are widely used by physicians and other professionals to learn, review, and standardize procedural workflows. In addition, consumers increasingly seek step-by-step visual explanations to better understand medical procedures and clinical practices. With the advancement of generative models, the medical domain has also seen progress in medical video understanding, clinical decsion support etc,. Toward this, the MedGenVidQA shared task focuses on developing systems that utilize generative models to retrieve relevant multimodal (textual and visual) sources and to localize visual answers within medical videos in response to consumer and healthcare professional medical queries. Additionally, resource creation in the medical domain is both costly and time-consuming, as it often requires the involvement of medical experts. In this context, we also aim to assess the capability of generative models to create question–answer pairs from medical videos. Following earlier editions of medical question answering tasks: MedVidQA 2023, MedVidQA 2024, BioGen 2024, and BioGen 2025, this shared task expands medical video question answering for both professionals and consumers, with a focus on generative approaches to solving these tasks.

News

March 8, 2026: Starter kit has been released..
Februray 16, 2026: CodaBench Submission Open.
Februray 16, 2026: Test dataset released.
January 30, 2026: Corpus released.
January 16, 2026: Training and Validation datasets released.
January 16, 2026: Introducing the MedGenVidQA 2026 challenge.

Important Dates (Tentative)

	Corpus Release	Training/Val Set Release	Test Set Release	Submission Deadline	Official Results	Paper Submission Deadline
Task A	January 30	January 16	February 16	April 7	April 20	April 30
Task B	January 30	January 16	February 16	April 7	April 20	April 30
Task C	January 30	January 16	February 16	April 7	April 20	April 30

Join our Google Group for important updates! If you have any questions, ask in our Google Group or email us.

Registration and Submission

Registration and Submission will be done via CodaBench
Participants should submit system outputs through the corresponding CodaBench competition for each subtask. Teams may participate in any subset of the subtasks.
Each team is allowed up to ten successful submissions per subtask on CodaBench.
Teams should designate their best-performing submission for each subtask by pushing it to the leaderboard.

Paper Submission

Participants of the shared task are encouraged to submit system description papers to the BioNLP 2026 workshop, co-located with ACL 2026. Submissions will be reviewed through an expedited process.
Submission Length:
- Teams involved in a single subtask are encouraged to submit a short paper.
- Teams contributing to multiple subtasks may choose between a short or a full-length paper.
Formatting Requirements:
- Submissions should follow the official ACL 2026 formatting guidelines.
- Submissions must be non-anonymous and include author names and affiliations.
- Recommended title format: “{TEAM_NAME} at MedGenVidQA 2026: {DESCRIPTIVE_TITLE}”.
- Paper Template : The Overleaf LaTeX template is available here.
How to Submit:
Papers should be submitted via the Softconf system:
https://softconf.com/acl2026/bionlp2026-st
Please select “MedGenVidQA” as the submission type.

References:

We encourage you cite the following shared task overview and relevant dataset papers.

@inproceedings{gupta-etal-2026-medgenvid-qa-overview, 
  title = "Overview of the MedGenVidQA 2026 Shared Task on Medical Generative Video Question Answering", 
  author = "Gupta, Deepak and Campbell, Collin Scott and Golnari, Pedram and Demner-Fushman, Dina", 
  booktitle = "Proceedings of the 25th Workshop on Biomedical Language Processing (BioNLP 2026)", 
  year = "2026",
  address = "San Diego, USA", 
  publisher = "Association for Computational Linguistics"
}

@article{gupta2025dataset,
  title={A Dataset of Medical Questions Paired with Automatically Generated Answers and Evidence-Supported References},
  author={Gupta, Deepak and Bartels, Davis and Demner-Fushman, Dina},
  journal={Scientific Data},
  volume={12},
  number={1},
  pages={1035},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

@article{gupta2023dataset,
  title={A Dataset for Medical Instructional Video Classification and Question Answering},
  author={Gupta, Deepak and Attal, Kush and Demner-Fushman, Dina},
  journal={Scientific Data},
  volume={10},
  number={1},
  pages={158},
  year={2023},
  publisher={Nature Publishing Group UK London}
}

Starter kit

The Starter kit provides a complete pipeline including data download, preprocessing, indexing, sample submissions and a baseline model implementation. It also produces submission-ready outputs to streamline experimentation and benchmarking.

Tasks

Task A: MMR Task B: MAG Task C: VAL

Task A: Multimodal Retrieval (MMR)

Given a medical query and a collection of multimodal sources (textual and video), the task aims to retrieve the relevant video and PubMed articles from the video and PubMed collection which contain the answer to the medical query.

Datasets

Corpus:

PubMed 2026 baseline: consisting of the latest released PubMed articles. Download PubMed Corpus

Video Corpus: consisting of a collection of professional and consumer-friendly videos. Download Video Corpus

Training and Validation Datasets:

MedVidQA collections [1] consisting of 3,010 human-annotated instructional questions and visual answers from 900 health-related videos. Download Dataset
MedAESQA collections [2] consisting of 8,427 human-annotated PubMed documents against the answer to the consumer health questions. Download Dataset

Test Dataset:

Can be downloaded from CodaBench.

Evaluations

We will evaluate the performance of the video and text retrieval system using the Mean Average Precision (MAP), Recall@k, Precision@k, and nDCG metrics, with k = {5, 10}. We will follow the trec_eval evaluation library.

Run Submission

CodaBench https://www.codabench.org/competitions/13989/

Task B: Multimodal Answer Generation (MAG)

Given a medical query and a collection of multimodal sources (text and video), the task aims to generate an answer that includes attributions (cited references from the PubMed, YouTube Video, or OpenIVideo corpora) for each answer sentence.
Participants may use any of the sources (PubMed, YouTube video, or OpenIVideo) provided in the released corpus to support the answer sentence generated by their models.

The generated answer must meet the following requirements:

The total length of the generated answer should be within 250 words.
There should be no more than three PMID and/or video sources per answer sentence.
The PMIDs must be selected only from the valid set of PubMed corpus released with the dataset.
The video sources must be selected only from the valid set of Video Corpus released with the dataset.

Corpus:

PubMed 2026 baseline: consisting of the latest released PubMed articles. Download PubMed Corpus

Video Corpus: consisting of a collection of professional and consumer-friendly videos. Download Video Corpus

Training and Validation Datasets:

MedAESQA collections [2] consisting of 8,427 human-annotated PubMed documents against the answer to the consumer health questions. Download Dataset

Test Dataset:

Can be downloaded from CodaBench.

Evaluations

Following MedVidQA[1], we will use Mean Intersection over Union (mIoU) and IoU =0.3, IoU=0.5 and IoU=0.7 as the evaluation metrics.

Run Submission

CodaBench https://www.codabench.org/competitions/14014/

Task C: Visual Answer Localization (VAL)

Given a medical query and a video, the task aims to locate the temporal segments (start and end timestamps) in the video where the answer to the medical query is being shown or the explanation is illustrated in the video.

Datasets

Training and Validation Datasets:

MedVidQA collections [1] consisting of 3,010 human-annotated instructional questions and visual answers from 900 health-related videos. Download Dataset
HealthVidQA collections [3] consisting of 76K automatically generated instructional questions and visual answers from 16K health-related videos.
Download Datasets

Test Dataset:

Can be downloaded from CodaBench.

Evaluations

Following MedVidQA[1], we will use Mean Intersection over Union (mIoU) and IoU =0.3, IoU=0.5 and IoU=0.7 as the evaluation metrics.

Run Submission

CodaBench https://www.codabench.org/competitions/14015/

Organizers

Deepak Gupta NLM, NIH

Dina Demner-Fushman NLM, NIH

References

Deepak Gupta, Kush Attal, and Dina Demner-Fushman. A Dataset for Medical Instructional Video Classification and Question Answering, Sci Data 10, 158 (2023).
Deepak Gupta, Davis Bartels, and Dina Demner-Fushman. A Dataset of Medical Questions Paired with automatically Generated answers and Evidence-supported References. Sci Data, 12.1 (2025): 1035.
Deepak Gupta, Kush Attal, and Dina Demner-Fushman. Towards answering health-related questions from medical videos: Datasets and approaches. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16399-16411, 2024.

Medical Generative Video Question Answering

(MedGenVidQA 2026)

A shared task at BioNLP 2026

Introduction

News

Important Dates (Tentative)

Registration and Submission

Paper Submission

Starter kit

Tasks

Task A: Multimodal Retrieval (MMR)

Datasets

Corpus:

Training and Validation Datasets:

Test Dataset:

Evaluations

Run Submission

Task B: Multimodal Answer Generation (MAG)

Corpus:

Training and Validation Datasets:

Test Dataset:

Evaluations

Run Submission

Task C: Visual Answer Localization (VAL)

Datasets

Training and Validation Datasets:

Test Dataset:

Evaluations

Run Submission

Organizers

References