Publications

You can find the full list of my articles on my Google Scholar profile.

Selected Publications


Metrics for Holistic Evaluation of LLM Reasoning about Action, Change, and Planning

Published in NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling, 2025

New informative Metrics for Evaluation of LLM Responses in Planning and Reasoning tasks.

Recommended citation: Murthy, A. B., Mink, J., & Sanneman, L. (2025). Metrics for holistic evaluation of LLM reasoning about action, change, and planning. In NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling.
Download Paper

LLMs Need to Go Beyond Computational Confidence Metrics to Establish Trust

Published in AAAI 2025 Symposium on AI Trustworthiness and Risk Assessment for Challenged Contexts (ATRACC-25), 2025

Perspective Paper on highlighting the Insufficiency of Computational Assessments of Trust and the need for Human-Centered Evaluations of Trustworthiness in Generative AI Systems.

Recommended citation: B Murthy, A., & Sanneman, L. (2025). LLMs Need to Go Beyond Computational Confidence Metrics to Establish Trust. Proceedings of the AAAI Symposium Series, 7(1), 131-136. https://doi.org/10.1609/aaaiss.v7i1.36878
Download Paper

Position: LLMs can’t plan, but can help planning in LLM-modulo frameworks

Published in International Conference on Machine Learning (ICML) as Spotlight, 2024

This work proposes a novel neuro-symbolic approach for reliable planning with LLMs by having external model-based verifiers and critics in a bi-directional interaction regime.

Recommended citation: Kambhampati, S., Valmeekam, K., Guan, L., Verma, M., Stechly, K., Bhambri, S., ... & Murthy, A. B. (2024, June). Position: LLMs can’t plan, but can help planning in LLM-modulo frameworks. In Forty-first International Conference on Machine Learning.
Download Paper