I am currently a research scientist at Alibaba Tongyi Lab, where I focus on the research and application of foundational models, with a particular emphasis on multimodal generative models. My research interests include multimodal content understanding, multimodal editing and generative. Relevant works from my team have been accepted by conferences such as CVPR, ICLR, NeurIPS, and AAAI. From 2018 to 2022, I work at Alibaba's DAMO Academy, focusing on the research and development of multimedia content understanding technologies. My work centers on the application of multimodal understanding algorithms in the media asset industry. The related capabilities are integrated into Alibaba Cloud's Multimedia AI product and have been widely applied in scenarios such as TV media asset management and internet search and recommendation systems. From 2023 to the present, I have been working at Alibaba's Tongyi Lab on research and development related to foundational generative models. My research directions include fine-tuning frameworks for base models, lightweight fine-tuning and controllable generation for generative models, and a unified framework for multimodal generation (both diffusion-based and LLM-based). This has resulted in the ACE series of works, including ResTuning, SCEdit, ACE, ACE++, and VACE. The related capabilities and models have been open-sourced and deployed.