GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
Published in arXiv preprint, 2025
We propose GenPRM, a method for scaling test-time compute of process reward models via generative reasoning. This work addresses the computational challenges in process reward model evaluation and provides an efficient solution for large-scale applications.
Key Contributions:
- Proposed GenPRM for scaling test-time compute of process reward models
- Introduced generative reasoning approach for efficient evaluation
- Demonstrated effectiveness on large-scale applications
Recommended citation: Jian Zhao, Runze Liu, Kaiyan Zhang, Zhimu Zhou, Junqi Gao, Dong Li, Jiafei Lyu, Zhouyi Qian, Biqing Qi, Xiu Li, Bowen Zhou. (2025). "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning." arXiv preprint arXiv:2505.15825.
Download Paper
