ML and NLP research projects on sustainability, climate change, ESG, and greenwashing

This website provides projects and datasets on sustainability, climate change and greenwashing related topics for AI, ML, and NLP researchers. The information here is for academic research purposes and not intended for use by the general public, investors, or business-related users. The content of the webpages do not reflect the views of any organization.

The projects are mainly led by:

(Webmaster) Gaku Morio: Stanford University, Hitachi America
Christopher D. Manning: Stanford University
Soh Young In: KAIST
Isabella Yoon: University of Otago
Harri Rowlands: InfluenceMap

Events

International Workshop on Sustainable Transition with AI @ IJCAI 2024

(Organizing Committee) Christopher D. Manning, Young Joon Lee, Soh Young In, Gaku Morio
(Venue) IJCAI 2024
Workshop website
IJCAI 2024
(Summary) Designed for IJCAI 2024 on Jeju Island, the “Sustainable Transition with AI (STAI)” workshop is a full-day event that explores the pivotal role of Artificial Intelligence (AI) in advancing global and local sustainability initiatives. STAI will spotlight the innovative use of Natural Language Processing (NLP), Computer Vision (CV), agents, data science, and Machine Learning (ML) in scrutinizing and enhancing sustainability communications.

Talks

Keynote: Using NLP to investigate corporate engagement on climate change

(Author) Gaku Morio
(Venue) ClimateNLP workshop at ACL 2024

Research Projects

An NLP Benchmark Dataset for Assessing Corporate Climate Policy Engagement

(Author) Gaku Morio and Christopher D. Manning
(Venue) NeurIPS 2023 Datasets and Benchmarks Track
Paper
GitHub
(Summary) As societal awareness of climate change grows, corporate climate policy engagements are attracting attention. We propose a dataset to estimate corporate climate policy engagement from various PDF-formatted documents. Our dataset comes from LobbyMap (a platform operated by global think tank InfluenceMap) that provides engagement categories and stances on the documents. To convert the LobbyMap data into the structured dataset, we developed a pipeline using text extraction and OCR. Our contributions are: (i) Building an NLP dataset including 10K documents on corporate climate policy engagement. (ii) Analyzing the properties and challenges of the dataset. (iii) Providing experiments for the dataset using pre-trained language models. The results show that while Longformer outperforms baselines and other pre-trained models, there is still room for significant improvement. We hope our work begins to bridge research on NLP and climate change.
(Data) The dataset used in this analysis is part of InfluenceMap's content and was used and is released with the latter's approval. InfluenceMap maintains (since 2015) an ongoing database containing millions of data points each consisting of evidence pieces around corporate climate/nature claims and performance. These are scored against globally accepted science based benchmarks such as the IPCC and the IPBES. Subsets of this data for ML/AI and other analysis are available by request from InfluenceMap and use of InfluenceMap's content is subject to Terms and Conditions. Please contact us at info@influencemap.org for more information (kindly use an organizational email). I have accept the above terms and download the dataset.
(Acknowledgement) Dylan Tanner, Edward Collins, Chris Hurst, who founded InfuenceMap in 2015, and Harri Rowlands provided valuable data and allowed us to make our datasets publicly available. We thank them for their assistance and interest in the work.

ReportParse: A Unified NLP Tool for Extracting Document Structure and Semantics of Corporate Sustainability Reporting

(Author) Gaku Morio, Soh Young In, Jungah Yoon, Harri Rowlands, and Christopher D. Manning
(Venue) IJCAI 2024 Demos Track
Paper
GitHub
(Summary) We introduce ReportParse, a Python-based tool designed to parse corporate sustainability reports. It combines document structure analysis with natural language processing (NLP) models to extract sustainability-related information from the reports. We also provide easy-to-use web and command interfaces. The tool is expected to aid researchers and analysts in evaluating corporate commitment to and management of sustainability efforts.

Predicting Narratives of Climate Obstruction in Social Media Advertising

(Author) Harri Rowlands, Gaku Morio, Dylan Tanner, and Christopher D. Manning
(Venue) Findings of ACL 2024
Paper
GitHub
(Summary) Social media advertising offers a platform for fossil fuel value chain companies and their agents to reinforce their narratives, often emphasizing economic, labor market, and energy security benefits to promote oil and gas policy and products. Whether such narratives can be detected automatically and the extent to which the cost of human annotation can be reduced is our research question. We introduce a task of classifying narratives into seven categories, based on existing definitions and data. Experiments showed that RoBERTa-large outperforms other methods, while GPT-4 turbo can serve as a viable annotator for the task, thereby reducing human annotation costs. Our findings and insights provide guidance to automate climate-related ad analysis and lead to more scalable ad scrutiny.