A plan-aware reward model for scoring and re-ranking candidate GUI actions for computer-use agents, trained on multi-OS offline trajectories and validated on OSWorld.
Apr 6, 2026
We introduce Agent Alpha, a unified framework that synergizes generation, exploration, and evaluation through step-level MCTS for computer-use agents, achieving state-of-the-art on OS-World.
Feb 3, 2026