Metrics for Holistic Evaluation of LLM Reasoning about Action, Change, and Planning
Published in NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling, 2025
New informative Metrics for Evaluation of LLM Responses in Planning and Reasoning tasks.
Recommended citation: Murthy, A. B., Mink, J., & Sanneman, L. (2025). Metrics for holistic evaluation of LLM reasoning about action, change, and planning. In NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling.
Download Paper
