Hi, I am Kai-Po Chang (張凱博), a third-year PhD student advised by Yu-Chiang Frank Wang at National Taiwan University. My research focuses on mitigating hallucinations in large vision-language models (LVLMs), and more recently has extended to the related topics in large vision-language(-action) world model.
Over the past few years, my research has introduced several approaches for mitigating hallucinations in LVLMs. Rapper mitigates the hallucination and implausibility through reinforced language-based feedback, while Santa specifically targets object and action hallucinations in video understanding via self-augmented contrastive alignment. Furthermore, I have also been fortunate to work with many excellent collaborators on a range of impactful projects. These include co-authoring projects such as Receler for diffusion model concept erasing and VideoMage for customized video generation, as well as mentoring junior collaborators on interesting works, including ARM for LLM serial lifelong knowledge editing, and TA-Prompting for improving the temporal understanding of LVLMs.
To develop trustworthy and deployable learning-based models capable of performing arbitrary physical labor tasks in embodied environments and real-world.