Program and Papers
Location and Schedule
- Date: June 3rd, 2026.
- Workshop Location: 102/104
- Poster Location: Exhibit Hall A, 229 – 236
| Time Slot | Event |
|---|---|
| 08:30-08:45 | Welcome and Introduction |
| 08:45-09:15 | Keynote: Prof. Dr. Michael S. Ryoo |
| 09:15-09:45 | Keynote: Prof. Katerina Fragkiadaki |
| 09:45-10:15 | Keynote: Prof. Wenhu Chen |
| 10:15-10:30 | Paper Talk: Molmo2 by Zixian Ma |
| 10:30-11:00 | Keynote: Prof. Ziwei Liu |
| 11:00-11:30 | Keynote: Prof. Yao Qin |
| 11:30-12:00 | Keynote: Prof. Vicente Ordóñez-Román |
| 12:00-12:20 | Challenge Results |
| 12:20-13:00 | Posters and Coffee |
All times are in local time zone, Mountain Daylight Time (MDT).
Accepted Papers: Challenge Extended Abstracts
Timing-Content Separation for Human-Coach-Like Exercise Feedback Generation
Koki Kawamura, Shuhei Kurita, Taiki Miyanishi, Inoue Nakamasa
pdf
Technical Report of Team MR-CAS
Ruochen Cui, Yuhai Li, Shilong Bao, Boyu Han, Qianqian Xu, Qingming Huang
pdf
Accepted Papers: Main Conference Papers
Streaming Video Instruction Tuning
Jiaer Xia, Peixian Chen, Mengdan Zhang, Xing Sun, Kaiyang Zhou
pdf
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
Christopher Clark, Jieyu Zhang, Zixian Ma, Jae Sung Park, Mohammadreza Salehi, Rohun Tripathi, Sangho Lee, Zhongzheng Ren, Chris Dongjoo Kim, Yinuo Yang, Vincent Shao, Yue Yang, Weikai Huang, Ziqi Gao, Taira Anderson, Jianrui Zhang, Jitesh Jain, George Stoica, Winson Han, Ali Farhadi, Ranjay Krishna
pdf
Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
Yayuan Li, Aadit Jain, Filippos Bellos, Jason Corso
pdf
Vision-Speech Models: Teaching Speech Models to Converse about Images
Amélie Royer, Moritz Böhle, Gabriel de Marmiesse, Laurent Mazaré, Neil Zeghidour, Alexandre Défossez, Patrick Pérez
pdf
From 3D Pose to Prose: Biomechanics-Grounded Vision-Language Coaching
Yuyang Ji, Yixuan Shen, Feng Liu
pdf
Interactive Episodic Memory with User Feedback
Nikesh Subedi, Loris Bazzani, Ziad Al-Halah
pdf
Accepted Papers: Extended Abstracts
A Simple Baseline for Streaming Video Understanding
Yujiao Shen, Shulin Tian, Jingkang Yang, Ziwei Liu
pdf
Drive-to-Music: Context-Aware Generative Audio for In-Vehicle Experiences
Cosmin Dragoiu, Nooshin Nabizadeh
pdf
Accepted Papers: Archival
Binary Verification for Zero-Shot Vision
Rongbin Hu, Jeffrey Liu
pdf
StreamMind: Adaptive Temporal Memory for Interactive Question Answering on Live Video Streams
Suresh Kumar Palus, Partha Sarathi Samal, Sai Kiran Padmam, Bhavan Kumar B.R
pdf