Challenge Description - ICRA26 Workshop

AgiBot World Challenge ICRA2026 - World Model Track

We've designed a series of challenging tasks covering scenarios including kitchen environments, workbenches, dining tables, and bathroom settings, encompassing diverse robot-object interactions e.g., collisions, grasping, placement, and dragging maneuvers, to thoroughly evaluate models' generative capabilities.

🗂️ Dataset

The train, validation and test dataset for the competition can be downloaded from AgiBotWorld Challenge-2026 (World Model Track) .

🧪 Baseline

We've release the baseline repo . You can train your own world model based on the baseline repo or your own codebase.

📝 Rules

Authentication:

Please register your team on the official website of AgiBotWorldChallengeICRA2026 before submitting your results. To identify the team associated with your submission, we require you to include a meta_info.txt file in the compressed file. The first line of this file must be the Email Address used during your team registration.

A simple example of meta_info.txt:

test@test.test

If the email address in meta_info.txt does not match the list of registered email addresses in official website of AgiBotWorldChallengeICRA2026 , the evaluation will fail. We update the Team Information on the Google Form daily, so you will be able to submit your results normally after the day your registration.

Submission Limitation:

Each team is limited to one successfully evaluated submission per calendar day. The submission count resets daily at midnight, UTC time (based on Hugging Face's server time). If your team exceeds the daily submission limit, the evaluation will not proceed.

Teams are strictly prohibited from registering multiple team accounts to gain extra evaluation attempts. Any team found violating this rule will be disqualified and all related scores will be cancelled.

The size of your submitted compressed file should be less than 500MB. If the compressed file exceeds the size limit, the evaluation will not proceed.

Reproducibility:

If your team ranking is in the top positions of the leaderboard (public score) when the competition closes, we will send an email to your registered address requesting submission of relevant training and inference codes for validity verification and reproducibility checks. Please be sure to check the detailed instructions in the follow-up email.

📦 Submission

As shown in the instruction of the baseline repo , the prediction directory should contain the following. Note that the number of prediction frames of each episode should be equal to the number of actions minus one because the action array also incude the action of the provided first frame. Moreover, as mentioned above, meta_info.txt should be included in the submission folder. Finally, the entire size of the compression file (.zip) should be less than 500 MB.

We also provide a template submission file here .

PREDICTIONS/
├── meta_info.txt
├── task_0/
│   ├── episode_0/
│   │   ├── 0/
│   │   │   └── video/
│   │   │       ├── frame_00000.jpg
│   │   │       ├── frame_00001.jpg
│   │   │       ├── ...
│   │   │       └── frame_T0.jpg
│   │   ├── 1/
│   │   │   └── video/
│   │   │       ├── frame_00000.jpg
│   │   │       ├── frame_00001.jpg
│   │   │       ├── ...
│   │   │       └── frame_T0.jpg
│   │   └── 2/
│   │       └── video/
│   │           ├── frame_00000.jpg
│   │           ├── frame_00001.jpg
│   │           ├── ...
│   │           └── frame_T0.jpg
│   ├── episode_1/
│   │   ├── 0/
│   │   │   └── video/
│   │   │       ├── frame_00000.jpg
│   │   │       ├── frame_00001.jpg
│   │   │       ├── ...
│   │   │       └── frame_T1.jpg
│   │   ├── 1/
│   │   │   └── video/
│   │   │       ├── frame_00000.jpg
│   │   │       ├── frame_00001.jpg
│   │   │       ├── ...
│   │   │       └── frame_T1.jpg
│   │   └── 2/
│   │       └── video/
│   │           ├── frame_00000.jpg
│   │           ├── frame_00001.jpg
│   │           ├── ...
│   │           └── frame_T1.jpg
│   ├── episode_2/
│   └── ...
├── task_1/
└── ...

Compress and submit the predictions

cd PARENT_DIRECTORY_OF_PREDICTIONS
zip -r ./submission.zip ./PREDICTIONS

Upload the submission.zip.

📊 Evaluation

Before evaluation, we will resize all predicted images to (480, 640) (the same as the size of ground-truth images). Then, we use PSNR, scene consistency and nDTW for evaluation. More details can be found in EWMBench and baseline repo .

To calculate the overall score, we normalize PSNR through np.clip(psnr, a_min=0, a_max=35)/35 and then average the scene consistency, nDTW and the normalized PSNR.

📅 Timeline

Competition Start	2/28
Submission Deadline	4/20

Ready to compete?

Go to Submission