Xingyu Fu

Xingyu Fu (府星妀)

πŸ‘‹ I am a Postdoctoral Fellow at Princeton University's PLI, working with Zhuang Liu, Danqi Chen, and Sanjeev Arora.

My research primarily focuses on generative multimodal models at the intersection between vision and natural language (e.g., multimodal LLMs, text-to-image/video generation, omni models). I aim to improve the perception and reasoning capabilities of multimodal models by bridging them together. I have built better evaluations for emergent abilities, and used synthetic data to design models that can better perceive and reason about the multimodal world. My PhD thesis is Bridging Perception and Reasoning in Multimodal Models.

I earned my Ph.D. in Computer Science at the University of Pennsylvania advised by Prof. Dan Roth from 2020 to 2025. During my PhD, I interned at Microsoft and AWS AI Labs. I did my B.S. in Computer Science at UIUC from 2017 to 2020, where I was very fortunate to be advised by Prof. Jiawei Han and Prof. Jingbo Shang.

🌟 Recent Highlights

📑 Research Projects

Reinforced Attention Learning

Reinforced Attention Learning

Bangzheng Li, Jianmo Ni, Chen Qu, Ian Miao, Liu Yang, Xingyu Fu, Muhao Chen, Derek Zhiyuan Cheng

Arxiv 2026 Feb

UEval

UEval: A Benchmark for Unified Multimodal Generation

Bo Li, Yida Yin, Wenhao Chai, Xingyu Fu*, Zhuang Liu*

Arxiv 2026 Jan

DeeptraceReward

Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs

Xingyu Fu, Siyi Liu, Yinuo Xu, Pan Lu, Guangqiuse Hu, Tianbo Yang, Taran Anantasagar, Christopher Shen, Yikai Mao, Yuanzhe Liu, Keyush Shah, Chung Un Lee, Yejin Choi, James Zou, Dan Roth*, Chris Callison-Burch*

Arxiv 2025 Sep

ReFocus

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Xingyu Fu, Minqian Liu, Zhengyuan Yang, John Corring, Yijuan Lu, Jianwei Yang, Dan Roth, Dinei Florencio, Cha Zhang

ICML 2025

Science-T2I

Science-T2I: Addressing Scientific Illusions in Image Synthesis

Jialuo Li, Wenhao Chai, Xingyu Fu, Haiyang Xu, Saining Xie

CVPR 2025

MuirBench

MUIRBENCH: A Comprehensive Benchmark for Robust Multi-image Understanding

Fei Wang*, Xingyu Fu*, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

ICLR 2025

Visual Sketchpad

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

Yushi Hu*, Weijia Shi*, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna

NeurIPS 2024

Commonsense-T2I

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

COLM 2024

BLINK

BLINK: Multimodal Large Language Models Can See but Not Perceive

Xingyu Fu*, Yushi Hu*, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna

ECCV 2024 · Spotlight of cVinW@CVPR 2024 · 36K downloads

Deceptive Semantic Shortcuts

Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?

Bangzheng Li, Ben Zhou, Fei Wang, Xingyu Fu, Dan Roth, Muhao Chen

NAACL 2024

ImagenHub

ImagenHub: Standardizing the evaluation of conditional image generation models

Max Ku, Tianle Li, Kai Zhang, Yujie Lu, Xingyu Fu, Wenwen Zhuang, Wenhu Chen

ICLR 2024

Generate then Select

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

Xingyu Fu, Sheng Zhang, Gukyeong Kwon, Pramuditha Perera, Henghui Zhu, Yuhao Zhang, Alexander Hanbo Li, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Patrick Ng, Dan Roth, Bing Xiang

ACL Findings 2023

Dynamic Clue Bottlenecks

Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering

Xingyu Fu, Ben Zhou, Sihao Chen, Mark Yatskar, Dan Roth

Arxiv 2023

There's a Time and Place

There's a Time and Place for Reasoning Beyond the Image

Xingyu Fu, Ben Zhou, Ishaan Chandratreya, Carl Vondrick, Dan Roth

ACL 2022 · Oral

Cross-lingual Entity Linking

Design Challenges in Low-resource Cross-lingual Entity Linking

Xingyu Fu*, Weijia Shi*, Xiaodong Yu, Zian Zhao, Dan Roth

EMNLP 2020

Semitic Root Extraction

Constrained sequence-to-sequence semitic root extraction for enriching word embeddings

Ahmed El-Kishky*, Xingyu Fu*, Aseel Addawood, Nahil Sobh, Clare Voss, Jiawei Han

WANLP @ ACL 2019

🎤 Invited Talks

💼 Work Experience