Medical Generative Video Question Answering

(MedGenVidQA 2026)

A shared task at BioNLP 2026

Introduction

The recent surge in the availability of online videos has changed the way of acquiring information and knowledge. Many people prefer instructional videos to teach or learn how to accomplish a particular task in an effective and efficient manner with a series of step-by-step procedures. This need is not only limited to general audiences. In professional settings such as healthcare, instructional videos are widely used by physicians and other professionals to learn, review, and standardize procedural workflows. In addition, consumers increasingly seek step-by-step visual explanations to better understand medical procedures and clinical practices. With the advancement of generative models, the medical domain has also seen progress in medical video understanding, clinical decsion support etc,. Toward this, the MedGenVidQA shared task focuses on developing systems that utilize generative models to retrieve relevant multimodal (textual and visual) sources and to localize visual answers within medical videos in response to consumer and healthcare professional medical queries. Additionally, resource creation in the medical domain is both costly and time-consuming, as it often requires the involvement of medical experts. In this context, we also aim to assess the capability of generative models to create question–answer pairs from medical videos. Following earlier editions of medical question answering tasks: MedVidQA 2023, MedVidQA 2024, BioGen 2024, and BioGen 2025, this shared task expands medical video question answering for both professionals and consumers, with a focus on generative approaches to solving these tasks.

News

Important Dates (Tentative)

Corpus
Release
Training/Val Set
Release
Test Set
Release
Submission
Deadline
Official
Results
Paper Submission
Deadline
Task A January 30 January 16 February 16 March 31 April 20 April 30
Task B January 30 January 16 February 16 March 31 April 20 April 30
Task C January 30 January 16 February 16 March 31 April 20 April 30

Join our Google Group for important updates! If you have any questions, ask in our Google Group or email us.

Registration and Submission

  • Registration and Submission will be done via CodaBench
  • Participants should submit system outputs through the corresponding CodaBench competition for each subtask. Teams may participate in any subset of the subtasks.
  • Each team is allowed up to ten successful submissions per subtask on CodaBench.
  • Teams should designate their best-performing submission for each subtask by pushing it to the leaderboard.

Paper Submission

  • All shared task participants are invited to submit a paper describing their systems to the Proceedings of the BioNLP 2026 at ACL 2026.
  • Paper must follow the submission instructions of the BioNLP 2026 workshop.

Starter kit

The Starter kit provides a complete pipeline including data download, preprocessing, indexing, sample submissions and a baseline model implementation. It also produces submission-ready outputs to streamline experimentation and benchmarking.

Tasks

Task A: Multimodal Retrieval (MMR)

Given a medical query and a collection of multimodal sources (textual and video), the task aims to retrieve the relevant video and PubMed articles from the video and PubMed collection which contain the answer to the medical query.

Datasets

Corpus:

PubMed 2026 baseline: consisting of the latest released PubMed articles. Download PubMed Corpus

Video Corpus: consisting of a collection of professional and consumer-friendly videos. Download Video Corpus

Training and Validation Datasets:

MedVidQA collections [1] consisting of 3,010 human-annotated instructional questions and visual answers from 900 health-related videos. Download Dataset
MedAESQA collections [2] consisting of 8,427 human-annotated PubMed documents against the answer to the consumer health questions. Download Dataset

Test Dataset:

Can be downloaded from CodaBench.

Evaluations

We will evaluate the performance of the video and text retrieval system using the Mean Average Precision (MAP), Recall@k, Precision@k, and nDCG metrics, with k = {5, 10}. We will follow the trec_eval evaluation library.

Run Submission

CodaBench https://www.codabench.org/competitions/13989/

Task B: Multimodal Answer Generation (MAG)

Given a medical query and a collection of multimodal sources (text and video), the task aims to generate an answer that includes attributions (cited references from the PubMed, YouTube Video, or OpenIVideo corpora) for each answer sentence.
Participants may use any of the sources (PubMed, YouTube video, or OpenIVideo) provided in the released corpus to support the answer sentence generated by their models.

The generated answer must meet the following requirements:

  • The total length of the generated answer should be within 250 words.
  • There should be no more than three PMID and/or video sources per answer sentence.
  • The PMIDs must be selected only from the valid set of PubMed corpus released with the dataset.
  • The video sources must be selected only from the valid set of Video Corpus released with the dataset.

Corpus:

PubMed 2026 baseline: consisting of the latest released PubMed articles. Download PubMed Corpus

Video Corpus: consisting of a collection of professional and consumer-friendly videos. Download Video Corpus

Training and Validation Datasets:

MedAESQA collections [2] consisting of 8,427 human-annotated PubMed documents against the answer to the consumer health questions. Download Dataset

Test Dataset:

Can be downloaded from CodaBench.

Evaluations

Following MedVidQA[1], we will use Mean Intersection over Union (mIoU) and IoU =0.3, IoU=0.5 and IoU=0.7 as the evaluation metrics.

Run Submission

CodaBench https://www.codabench.org/competitions/14014/

Task C: Visual Answer Localization (VAL)

Given a medical query and a video, the task aims to locate the temporal segments (start and end timestamps) in the video where the answer to the medical query is being shown or the explanation is illustrated in the video.

Datasets

Training and Validation Datasets:

MedVidQA collections [1] consisting of 3,010 human-annotated instructional questions and visual answers from 900 health-related videos. Download Dataset
HealthVidQA collections [3] consisting of 76K automatically generated instructional questions and visual answers from 16K health-related videos.
Download Datasets

Test Dataset:

Can be downloaded from CodaBench.

Evaluations

Following MedVidQA[1], we will use Mean Intersection over Union (mIoU) and IoU =0.3, IoU=0.5 and IoU=0.7 as the evaluation metrics.

Run Submission

CodaBench https://www.codabench.org/competitions/14015/

Organizers

References

  1. Deepak Gupta, Kush Attal, and Dina Demner-Fushman. A Dataset for Medical Instructional Video Classification and Question Answering, Sci Data 10, 158 (2023).
  2. Deepak Gupta, Davis Bartels, and Dina Demner-Fushman. A Dataset of Medical Questions Paired with automatically Generated answers and Evidence-supported References. Sci Data, 12.1 (2025): 1035.
  3. Deepak Gupta, Kush Attal, and Dina Demner-Fushman. Towards answering health-related questions from medical videos: Datasets and approaches. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16399-16411, 2024.