I am currently a research scientist at Alibaba Tongyi Lab, where I focus on the research and application of foundational models, with a particular emphasis on multimodal generative models. My research interests include multimodal content understanding, multimodal editing and generative. Relevant works from my team have been accepted by conferences such as CVPR, ICLR, NeurIPS, and AAAI. From 2011 to 2018, I studied at Zhejiang University and obtained my bachelor's and master's degrees. During my graduate studies, my research focused on machine learning and computer vision, with an emphasis on academic research in the field of person re-identification. One of my papers received the Best Student Paper Award at ACML 2017, and another paper was accepted by AAAI 2018. From 2018 to 2022, I worked at Alibaba's DAMO Academy, focusing on the research and development of multimedia content understanding technologies. My work centers on the application of multimodal understanding algorithms in the media asset industry. The related capabilities are integrated into Alibaba Cloud's Multimedia AI product and have been widely applied in scenarios such as TV media asset management and internet search and recommendation systems. From 2023 to the present, I have been working at Alibaba's Tongyi Lab on research and development related to foundational generative models. My research directions include fine-tuning frameworks for base models, lightweight fine-tuning and controllable generation for generative models, and a unified framework for multimodal generation (both diffusion-based and LLM-based). This has resulted in the ACE series of works, including ResTuning, SCEdit, ACE, ACE++, and VACE. The related capabilities and models have been open-sourced and deployed.