I specialize in building and studying MLLM-based Graphical User Interface (GUI) Agents. My work involves both their practical development and evaluation, but my core research passion is to elevate their cognitive capabilities toward human-like reasoning and generalization. To this end, my research also extends to synergistic fields such as RL4LLM, LLM Reasoning, and Tool-Use Agents, which I consider crucial for pushing the boundaries of what GUI agents can achieve.
Currently a third-year Undergraduate student in Communication Engineering at Xidian University and Heriot-Watt University, I am actively seeking a PhD/Intern Opportunity and contribute to advancing this exciting field.
📝 Publications

You Don’t Know Until You Click: Automated GUI Testing for Production-Ready Software Evaluation
Yutong Bian*, Xianhao Lin, Yupeng Xie et al.
- As the lead for AppEvalPilot, I designed and implemented a system to dynamically assess software functionality through UI interaction, moving beyond the limitations of static analysis for LLM-based software engineers.
- My work involved creating automated test case generation and a test execution agent capable of complex GUI interactions.
- The experimental results demonstrated a high correlation (0.91) between AppEvalPilot’s assessments and those of human experts, while also being 55% faster and 94.8% cheaper.
📖 Educations
- 2021.09 - 2026.06, B.Eng. in Communication Engineering, Xidian University, China & Heriot-Watt University, UK.
- GPA: 3.8/4.0
💻 Internships
- 2024.09 - 2025.05, MetaGPT, Shenzhen, China.
-
Research Intern Supervisor: Sirui Hong, Haoming Tang, Chenglin Wu - Topic: GUI Agent; Agent Training; Benchmark
-
🚀 Projects
- OSAgent: Cross-platform Intelligent Assistant (Sep. 2024 - May. 2025)
- Focused on developing a universal, stable, and efficient GUI agent framework.
- Contributed to the architecture design, perception, planning, and execution modules.
- Achieved state-of-the-art performance on SpaBench (mobile) cross-application tasks (26.7% vs. 13.3% by the previous SOTA).
- R1-Like GUI Agent Training (Apr. 2024 - May. 2025)
- Focused on enhancing the core element grounding capability of GUI agents using GRPO.
- Designed a data collection and refinement pipeline and a multi-component reward function.
- Demonstrated that 1k meticulously selected data points can achieve performance comparable to SOTA models trained on millions of samples, significantly improving GUI grounding accuracy on benchmarks like ScreenSpot and ScreenSpotPro.