ATLAS: A spend classification benchmark for estimating scope 3 carbon emissions

Published in Tackling Climate Change with Machine Learning Workshop, NeurIPS 2024, 2024

Recommended citation: Dumit, A., Rao, K., Kwee, T., Glidden, J., Gopalakrishnan, V., Tsai, K., & Suh, S. (2024). ATLAS: A spend classification benchmark for estimating scope 3 carbon emissions. Tackling Climate Change with Machine Learning Workshop at NeurIPS 2024. https://www.climatechange.ai/papers/neurips2024/70

About 70% of companies that report value-chain emissions rely on financial spend ledgers paired with emissions factors per dollar. Accurate classification of expenditures to emissions factors is critical but complex, given the sheer number of line items and the diversity of how they are categorized and described. We introduce ATLAS, the first spend classification benchmark, comprising 10,000 labeled and de-identified spend items derived from human experts classifying spend for company scope 3 inventories. The best LLM achieves a top-1 accuracy of 50% and a top-3 accuracy of 61%. ATLAS enables systematic evaluation of LLMs for spend classification and provides a starting point for advancing automated carbon accounting and sustainability reporting for spend-based emissions.

In the news …

  1. How we built ATLAS: a benchmark for spend classification in scope 3 carbon accounting, Watershed Blog