GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

Published in arXiv preprint, 2025

We propose GenPRM, a method for scaling test-time compute of process reward models via generative reasoning. This work addresses the computational challenges in process reward model evaluation and provides an efficient solution for large-scale applications.

Key Contributions:

  • Proposed GenPRM for scaling test-time compute of process reward models
  • Introduced generative reasoning approach for efficient evaluation
  • Demonstrated effectiveness on large-scale applications

Download paper here

Recommended citation: Jian Zhao, Runze Liu, Kaiyan Zhang, Zhimu Zhou, Junqi Gao, Dong Li, Jiafei Lyu, Zhouyi Qian, Biqing Qi, Xiu Li, Bowen Zhou. (2025). "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning." arXiv preprint arXiv:2505.15825.
Download Paper