Challenges

AI Coach Challenge @ VAR 2026: “Don’t Just Watch. Intervene.”

The workshop will host two challenges on tasks that are crucial to enable real-world vision-based assistants. These challenges are designed to test both the low-level visual capabilities and higher-level reasoning skills of vision-based assistants.

  • The winning teams will receive a certificate, a prize and will be invited to present their solution in a contributed talk. The winning team in 2025 received a ASUS Zenbook A14 as a prize.
  • Deadline: June 1, 2026


Challenge 1: Fitness


Continuing from CVPR 2025, this challenge focuses on coaching users through a workout session with the right feedback at the right time, to correct mistakes and encourage the user. Details:

  • Evaluation Data: We base this challenge on the QEVD dataset, as described here. Specifically, the challenge involves providing timed feedback for a set of evaluation videos. For this challenge, we employ a (private) test set available here.

  • Training and Validation Data: For training and validation, please use the data provided in the QEVD page.

  • Evaluation Metrics: We will use the METEOR, ROUGE-L, BERT, LLM-Acc., and T-F-Score as described here. The code for these metrics is available here. If you have any questions contact us here.

  • Participation:

    • Leaderboard: Please email the results here as a json file along with the team name. The json file should contain a list of python dicts with the the following fields:
        [{“video_file”: <str: name of the evaluation video file>,
        “feedbacks”: <List[str]: list of predicted feedbacks>,
        “feedback_timestamps”: <List[float]: list of timestamps corresponding to the predicted feedbacks>}, ...]
      

      Each team will be allowed to make five submissions and we will provide the evaluation results of each submission as soon as possible. The team can choose the make the result public on the leaderboard below at any time.

    • Extended Abstract: The teams submitting to the challenge are also encouraged to submit an extended abstract through CMT. The page limit is a minimum of two pages and a maximum of four pages, excluding references. As subject area please choose “Challenge: Fitness”.

    • Winner: The winning team will be decided based on the five evaluation metrics described above. The winning team is the one that outperforms others on most metrics. The code of the winning team will be inspected before the workshop.

Results (Continuously Updated)

Method METEOR↑ ROUGE-L↑ BERT↑ LLM-Acc↑ T-F-Score↑
VideoChat2 0.104 0.048 0.846 2.145 0.555
VideoLLaMA3-7B 0.150 0.076 0.859 2.554 0.555
Qwen-2-VL-Instruct 0.185 0.089 0.861 2.851 0.555
Qwen-2.5-VL-Instruct 0.174 0.068 0.855 3.153 0.555
CVPR 2025 Best 0.156 0.101 0.861 2.087 0.535



Challenge 2: Cooking


This challenge focuses on coaching users through a recipe with with the right feedback at the right time, to correct mistakes. Details:

  • Evaluation Data: We base this challenge on the Qualcomm Interactive Cooking Dataset dataset, as described here and here. Specifically, the challenge involves providing timed feedback for a set of evaluation videos. For this challenge, we employ the test set available in the link provided above.

  • Evaluation Set: We will use the main set of the Qualcomm Interactive Cooking Dataset and consider the turn based evaluation scheme described in Section 5.4 here.

  • Training and Validation Data: For training and validation, please use the data provided in the Qualcomm Interactive Cooking Dataset page.

  • Evaluation Metrics: We will use the IC-Acc and Mistake (Precision, Recall and F1) metrics as described here. The code for these metrics is available here. If you have any questions contact us here.

  • Participation:

    • Leaderboard: Please email the results here as a json file along with the team name. The json file should contain a list of python dicts with the the following fields:
        [{“video_id”: <str: name of the evaluation video file>,
        “pred_texts”: <List[str]: list of predicted instructions and feedbacks>,
        “pred_timestamps”: <List[float]: list of timestamps corresponding to the predicted instructions and feedbacks>}, ...]
      

      Each team will be allowed to make five submissions and we will provide the evaluation results of each submission as soon as possible. The team can choose the make the result public on the leaderboard below at any time.

    • Extended Abstract: The teams submitting to the challenge are also encouraged to submit an extended abstract through CMT. The page limit is a minimum of two pages and a maximum of four pages, excluding references. As subject area please choose “Challenge: Cooking”.

    • Winner: The winning team will be decided based on the four evaluation metrics described above. The winning team is the one that outperforms others on most metrics. As the test set is public, the code of the winning team will be inspected before the workshop.

Results (Continuously Updated)

Method IC-Acc ↑ Prec. ↑ Rec. ↑ F1 ↑ BERT ↑ ROUGE-L ↑
LLaVA-NeXT 1.4 0.00 0.00 0.00 0.000 0.000
Video-ChatGPT 1.6 0.00 0.00 0.00 0.000 0.000
VideoChat2 1.6 0.00 0.00 0.00 0.000 0.000
Video-LLaVA 2.0 0.00 0.00 0.00 0.000 0.000
VideoLLaMA3-7B 1.8 0.00 0.00 0.00 0.000 0.000
Videollm-online 0.03 0.02 0.98 0.04 0.332 0.248
Qwen2-VL-7B 6.3 0.02 0.69 0.05 0.377 0.256
Qwen2.5-VL-7B 18.9 0.18 0.01 0.02 0.299 0.219
Gemini-2.5-Flash 23.1 0.01 0.22 0.02 0.410 0.342