GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

Published in arXiv preprint, 2025

We propose GenPRM, a method for scaling test-time compute of process reward models via generative reasoning. This work addresses the computational challenges in process reward model evaluation and provides an efficient solution for large-scale applications.

Key Contributions:

Proposed GenPRM for scaling test-time compute of process reward models
Introduced generative reasoning approach for efficient evaluation
Demonstrated effectiveness on large-scale applications

Download paper here

Recommended citation: Jian Zhao, Runze Liu, Kaiyan Zhang, Zhimu Zhou, Junqi Gao, Dong Li, Jiafei Lyu, Zhouyi Qian, Biqing Qi, Xiu Li, Bowen Zhou. (2025). "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning." arXiv preprint arXiv:2505.15825.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)