Challenge Description
AgiBot World Challenge ICRA2026 - World Model Track
We've designed a series of challenging tasks covering scenarios including kitchen environments, workbenches, dining tables, and bathroom settings, encompassing diverse robot-object interactions e.g., collisions, grasping, placement, and dragging maneuvers, to thoroughly evaluate models' generative capabilities.
๐๏ธ Dataset
The train, validation and test dataset for the competition can be downloaded from AgiBotWorld Challenge-2026 (World Model Track) .
๐งช Baseline
We've release the baseline repo . You can train your own world model based on the baseline repo or your own codebase.
๐ Rules
Authentication:
Please register your team on the
official website of AgiBotWorldChallengeICRA2026
before submitting your results. To identify the team associated with your submission, we require
you to include a meta_info.txt file in the compressed file. The first line of this file
must be the Email Address used during your team registration.
A simple example of meta_info.txt:
test@test.test
If the email address in meta_info.txt does not match the list of registered email addresses in
official website of AgiBotWorldChallengeICRA2026
,
the evaluation will fail. We update the Team Information on the Google Form daily, so you will be able to
submit your results normally after the day your registration.
Submission Limitation:
Each team is limited to one successfully evaluated submission per calendar day. The submission count resets daily at midnight, UTC time (based on Hugging Face's server time). If your team exceeds the daily submission limit, the evaluation will not proceed.
Teams are strictly prohibited from registering multiple team accounts to gain extra evaluation attempts. Any team found violating this rule will be disqualified and all related scores will be cancelled.
The size of your submitted compressed file should be less than 500MB. If the compressed file exceeds the size limit, the evaluation will not proceed.
Reproducibility:
If your team ranking is in the top positions of the leaderboard (public score) when the competition closes, we will send an email to your registered address requesting submission of relevant training and inference codes for validity verification and reproducibility checks. Please be sure to check the detailed instructions in the follow-up email.
๐ฆ Submission
-
As shown in the instruction of the
baseline repo
,
the prediction directory should contain the following. Note that the number of prediction frames of each
episode should be equal to the number of actions minus one because the action array also incude the action
of the provided first frame. Moreover, as mentioned above, meta_info.txt should be included in
the submission folder. Finally, the entire size of the compression file (.zip) should be less than
500 MB.
We also provide a template submission file here .
PREDICTIONS/ โโโ meta_info.txt โโโ task_0/ โ โโโ episode_0/ โ โ โโโ 0/ โ โ โ โโโ video/ โ โ โ โโโ frame_00000.jpg โ โ โ โโโ frame_00001.jpg โ โ โ โโโ ... โ โ โ โโโ frame_T0.jpg โ โ โโโ 1/ โ โ โ โโโ video/ โ โ โ โโโ frame_00000.jpg โ โ โ โโโ frame_00001.jpg โ โ โ โโโ ... โ โ โ โโโ frame_T0.jpg โ โ โโโ 2/ โ โ โโโ video/ โ โ โโโ frame_00000.jpg โ โ โโโ frame_00001.jpg โ โ โโโ ... โ โ โโโ frame_T0.jpg โ โโโ episode_1/ โ โ โโโ 0/ โ โ โ โโโ video/ โ โ โ โโโ frame_00000.jpg โ โ โ โโโ frame_00001.jpg โ โ โ โโโ ... โ โ โ โโโ frame_T1.jpg โ โ โโโ 1/ โ โ โ โโโ video/ โ โ โ โโโ frame_00000.jpg โ โ โ โโโ frame_00001.jpg โ โ โ โโโ ... โ โ โ โโโ frame_T1.jpg โ โ โโโ 2/ โ โ โโโ video/ โ โ โโโ frame_00000.jpg โ โ โโโ frame_00001.jpg โ โ โโโ ... โ โ โโโ frame_T1.jpg โ โโโ episode_2/ โ โโโ ... โโโ task_1/ โโโ ... -
Compress and submit the predictions
cd PARENT_DIRECTORY_OF_PREDICTIONS zip -r ./submission.zip ./PREDICTIONS - Upload the
submission.zip.
๐ Evaluation
Before evaluation, we will resize all predicted images to (480, 640) (the same as the size of ground-truth images). Then, we use PSNR, scene consistency and nDTW for evaluation. More details can be found in EWMBench and baseline repo .
To calculate the overall score, we normalize PSNR through
np.clip(psnr, a_min=0, a_max=35)/35
and then average the scene consistency, nDTW and the normalized PSNR.
๐ Timeline
| Competition Start | 2/28 |
| Submission Deadline | 4/20 |
Ready to compete?
Go to Submission