(a) Compared with diffusion only design.
(b) Different design choices.
(c) Activate video editing capabilities through image data.
The reference images and videos used in these demos are sourced from public domains or generated by models, and are intended solely to demonstrate the capabilities of this research. If there are any concerns, please contact us and we will delete it in time.
@misc{mou2025instructx,
title={InstructX: Towards Unified Visual Editing with MLLM Guidance},
author={Chong Mou, Qichao Sun, Yanze Wu, Pengze Zhang, Xinghui Li, Fulong Ye, Songtao Zhao, Qian He},
year={2025},
eprint={2510.08485},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.08485},
}