{"id":82324,"date":"2024-12-18T14:45:45","date_gmt":"2024-12-18T14:45:45","guid":{"rendered":"https:\/\/www.climatepolicyinitiative.org\/?p=82324"},"modified":"2026-04-21T23:07:24","modified_gmt":"2026-04-21T23:07:24","slug":"building-al-ml-tools-to-track-public-development-banks-climate-ambition","status":"publish","type":"post","link":"https:\/\/www.climatepolicyinitiative.org\/id\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/","title":{"rendered":"Building AI\/ML tools to track public development banks&#8217; climate ambition"},"content":{"rendered":"\n<p>Public development banks (PDBs) play a critical role in the global transition toward low-emissions climate-resilient development. Particularly in emerging markets and developing economies (EMDEs), PDBs are instrumental in accelerating climate investment by financing transition projects and developing de-risking mechanisms to \u201ccrowd in\u201d private finance, shaping policy frameworks at a national and international level, and providing advisory services and technical assistance to accelerate the growth of climate project pipelines (<a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/approaches-to-meeting-the-paris-agreement-goals\/\" target=\"_blank\" rel=\"noreferrer noopener\">CPI 2024<\/a>).<\/p>\n\n\n\n<p>Accordingly, comprehensive tracking of PDB climate ambition can provide important insights into where PDB support for low-emissions climate-resilient transition is strongest, as well as where greater efforts are needed to raise their ambition. For example, initial tracking reports in <a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/public-financial-institutions-climate-commitments-2023-update\/\" target=\"_blank\" rel=\"noreferrer noopener\">2023<\/a> and <a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/public-financial-institutions-climate-commitments\/\" target=\"_blank\" rel=\"noreferrer noopener\">2022<\/a> showed that PDB climate ambition had thus far been largely concentrated among multilateral development banks (MDBs) and bilateral development finance institutions (DFIs) located in advanced economies.<\/p>\n\n\n\n<p>In 2024, to test the robustness of these and other previous findings, we have increased tracking from 70 to 170 institutions, aiming specifically to increase coverage of smaller PDBs in EMDEs and add multilingual functionality to include non-English sources. However, implementing these additions meant that a drastically larger volume of unstructured information would need to be ingested in order to begin the analysis, a task that the data backend developed in 2022 and used again in 2023 was not fit to handle.<\/p>\n\n\n\n<p>While the taxonomy of climate commitments was retained from the <a href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2022\/10\/public-fi-commitments-methodology-brief-oct-2022.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">original methodology<\/a>, in 2024, we completed an extensive overhaul of the commitments tracking process, utilizing artificial intelligence (AI) and machine learning tools (ML) to process substantially larger primary datasets and capture more robust information, leading to deeper analytical insights. See the <a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/public-development-banks-climate-commitments-2024\/\" target=\"_blank\" rel=\"noreferrer noopener\">2024 report on PDBs\u2019 climate commitments<\/a> for key findings and recommendations that were informed by the AI\/ML-enabled data-gathering process. In this technical blog post, we dive more deeply into the details of our new data methodology in the spirit of joint learning and knowledge exchange among climate finance analysts.<\/p>\n\n\n<section class=\"block block-chart is-image\"><div is=\"chart\/image\" class=\"chart-image\">\n\t\t<script type=\"json\/props\">{\n    \"colors\": []\n}<\/script>\n\n\t\t\t\t<h2 class=\"block-chart--title\"><h4>Figure 1: AI\/ML-enabled Data Collection Pipeline for Tracking PDBs\u2019 Climate Commitments<\/h4><\/h2> \n\t\t\n\t\t<div element=\"tabs\"><\/div>\n\n\t\t\t\t\t<a class=\"block-chart--image image--link\" href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/Annex-1.png\" target=\"_blank\"><div class=\"image--wrap\"><img src='https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/Annex-1.png' class=\"image\" alt=\"Annex-1\" style=\"max-width:100%\" \/><\/div><\/a><!-- image html = <a class=\"block-chart--image image--link\" href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/Annex-1.png\" target=\"_blank\"><div class=\"image--wrap\"><img src='https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/Annex-1.png' class=\"image\" alt=\"Annex-1\" style=\"max-width:100%\" \/><\/div><\/a>-->\t\t\n\t\t<div element=\"canvas\"><\/div>\n\n\t\t\t\t<group name=\"\">\n\t\t\t<!-- tab -->\t\t\t\n\t\t<\/group>\n\t\t\n\n\t\t\n\t\t\t<\/div><\/section>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h4 class=\"wp-block-heading is-style-default\">In this post, we present the following:<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"#the-data-science-challenge\">The data science challenge<\/a><\/li>\n\n\n\n<li><a href=\"#key-learnings\">Key learnings from working on solutions<\/a><\/li>\n\n\n\n<li><a href=\"#opportunities\">Opportunities for improvement and future research<\/a><\/li>\n\n\n\n<li><a href=\"#technical-summary\">Technical summary<\/a>\n<ul class=\"wp-block-list\">\n<li><a href=\"#part-1\">Solution part 1: Training an LLM text classifier to identify climate commitments<\/a><\/li>\n\n\n\n<li><a href=\"#part-2\">Solution part 2: Extracting commitment metadata with ChatGPT<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"the-data-science-challenge\">The data science challenge<\/h1>\n\n\n\n<p>The revised methodology leverages AI\/ML to solve the following data collection challenges:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>How can we identify the climate commitments adopted by PDBs?<\/li>\n\n\n\n<li>After PDBs\u2019 climate commitments are identified, how do we extract relevant metadata that allows for detailed analysis of the commitments\u2014e.g., when was the commitment made, what is the scale of the commitment\u2019s ambition, etc.?<\/li>\n<\/ul>\n\n\n\n<p>We have approached these challenges with the development of two specific AI\/ML-enabled solutions:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>A suite of large language model (LLM) text classifiers that leverages natural language processing (NLP) and ML to label text snippets collected from PDB websites when they contain references to specific climate commitments.<\/strong><br><br><\/li>\n\n\n\n<li><strong>A complementary set of ChatGPT prompts that extract key metadata fields from labeled text snippets to form structured time series data that contains information on the scale of PDB ambition contained in each commitment, which can be expanded as new data is collected.<\/strong><\/li>\n<\/ol>\n\n\n\n<p>In working on these solutions, we have not only developed new research tools to deepen understanding of global financial institutions\u2019 climate ambition but also uncovered key learnings that will inform the future deployment of AI\/ML tools to support our analytical projects.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"key-learnings\">Key learnings<\/h1>\n\n\n\n<p>As the demand for comprehensive climate finance data continues to grow, CPI and other research and advisory organizations in the space will face a growing number of opportunities\u2014and challenges\u2014to utilize AI\/ML approaches to enhance data gathering. The experience of developing AI\/ML-enabled solutions for the purpose of tracking PDB climate commitments has yielded a few key learnings that can inform future efforts by CPI and\/or partners:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The development of AI\/ML-enabled data collection tools is particularly advantageous when structured datasets are not readily available and\/or when processing of large volumes of data.<\/strong> At the moment, no structured dataset of PDB climate commitments exists outside of the tracking done by CPI, which can now be updated by ingesting thousands of text sources on an annual or semi-annual basis with minimal manual processing. Specifically, web scraping through Google Programmable Search provides a way to locate PDB climate commitments at scale, which are then automatically converted into a structured data table using LLM text classification models and metadata extraction prompts.<br><br><\/li>\n\n\n\n<li><strong>Standardized data formatting imposed using AI\/ML tools can facilitate linkages to complementary datasets, allowing for deeper analysis based on more comprehensive information.<\/strong> For example, the time series data structure returned by the AI\/ML data pipeline is compatible with matching PDBs\u2019 climate ambition against investment flows measured by <a href=\"https:\/\/www.climatepolicyinitiative.org\/the-programs\/climate-finance-tracking\/\" target=\"_blank\" rel=\"noreferrer noopener\">CPI\u2019s climate finance tracking<\/a>, as well as other data sources that characterize the operating contexts faced by PDBs (e.g., level of host government policy support, maturity of financial system, climate investment pipeline, etc.). As a result, our analysis of PDB climate ambition is much more robust in 2024 than in previous years.<br><br><\/li>\n\n\n\n<li><strong>Particularly in the case of NLP models, AI\/ML tools can be continuously retrained and adapted to different information collection tasks, efficiently integrating new information into data pipelines without incurring large computing costs or expending significant analyst resources.<\/strong> The initial text classification model is re-trained from the ClimateBERT <a href=\"https:\/\/huggingface.co\/climatebert\/distilroberta-base-climate-commitment\" target=\"_blank\" rel=\"noreferrer noopener\">model for climate commitments and actions<\/a>, which is subsequently adapted to create a series of secondary models that label commitment types according to CPI\u2019s <a href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2022\/10\/public-fi-commitments-methodology-brief-oct-2022.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">climate commitment taxonomy<\/a>, which then feed into metadata extraction using a series of ChatGPT prompts. As such, we are able to produce a novel structured dataset from unstructured text inputs, supported by a processing pipeline where performance gains are passed through to downstream tasks, leading to continuous quality improvements without reliance on costly high-performance computing or manual collection.<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"opportunities\">Opportunities for improvement and future research<\/h1>\n\n\n\n<p>As tracking of PDB climate commitments continues, further improvements can be made to the AI\/ML-enabled data collection process to produce better quality data and maximize efficiencies in subsequent years. Opportunities include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Refining the list of key words used to web scrape search results from PDB websites by evaluating the effectiveness of each individual key word set, assessing which most effectively retrieve validated commitments.<\/strong> This would center on analysis of true positives and false negatives of each set to determine which key word combinations most consistently yield climate commitments.<br><br><\/li>\n\n\n\n<li><strong>Exploring additional text classification models to better support ChatGPT\u2019s parsing of metadata, with options including <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.tree.DecisionTreeClassifier.html\" target=\"_blank\" rel=\"noreferrer noopener\">decision trees<\/a>, <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.RandomForestClassifier.html\" target=\"_blank\" rel=\"noreferrer noopener\">random forest<\/a>, <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LogisticRegression.html\" target=\"_blank\" rel=\"noreferrer noopener\">logistic regression<\/a>, or <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/svm.html\" target=\"_blank\" rel=\"noreferrer noopener\">Support Vector Machines (SVM)<\/a> \/ <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.neighbors.KNeighborsClassifier.html\" target=\"_blank\" rel=\"noreferrer noopener\">k-Nearest Neighbors (KNN)<\/a>.<\/strong> For example, a KNN could be used with Term Frequency-Inverse Document Frequency (TF-IDF) grouped tokenized terms due to its inefficiency with high dimensional data, while an SVM can be used with the ungrouped tokenized terms.<\/li>\n<\/ul>\n\n\n\n<p>In addition, structured data produced with the new AI\/ML-enabled data collection process presents a number of potential opportunities for future research, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Inferential assessment of the effect that climate commitments have on the volume and sectoral composition of PDBs\u2019 direct financing flows.<\/strong><br><br><\/li>\n\n\n\n<li><strong>Comparison of PDBs\u2019 climate ambition to that of private financial institutions tracked by CPI\u2019s <a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/net-zero-finance-tracker\/\" target=\"_blank\" rel=\"noreferrer noopener\">Net Zero Finance Tracker<\/a>.<\/strong><br><br><\/li>\n\n\n\n<li><strong>Integration of tracked PDB climate ambition into CPI\u2019s <a href=\"https:\/\/compass.climatepolicyinitiative.org\/\">Climate Finance Reform Compass<\/a> as a progress indicator.<\/strong><\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"technical-summary\">Technical summary<\/h1>\n\n\n\n<p>The following sections describe the technical data science methods used to develop the aforementioned AI\/ML-enabled tracking tools. This includes a detailed discussion of model training and fine-tuning, as well as an evaluation of how the models performed after being applied to unfamiliar data outside of a train\/test environment.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"part-1\">Solution part 1: Training an LLM text classifier to identify climate commitments<\/h3>\n\n\n\n<p>We started the process of identifying climate commitments made by PDBs by scraping relevant text snippets from PDB websites using <a href=\"https:\/\/programmablesearchengine.google.com\/about\/\" target=\"_blank\" rel=\"noreferrer noopener\">Google Programmable Search<\/a> queries. Queries are constructed around a set of English key words, which are translated into Spanish, French, and Portuguese (languages commonly found among the sample of tracked PDBs) to enable more exhaustive data collection. The key words, shown below, were selected to mimic the basic vocabulary of climate commitments made by PDBs.<\/p>\n\n\n\n<div style=\"overflow-x: auto;\">\n<table style=\"width: 100%; border-collapse: collapse;border: border: 0.5px solid grey;font-family: 'Whitney';font-size: 16px;\">\n     <tr style=\"background-color: #B53C36; color: white;\">\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Commitment area<\/th>\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Key word<\/th>\n        <\/tr>\n        <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Paris alignment\t<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">(announce | commit | pledge | target | aim) AND (align | aligning | alignment) AND Paris AND (agreement | &#8220;climate agreement&#8221; | accords | goals)<\/td>\n        <\/tr>\n        <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Mitigation targets<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">(announce | commit | pledge | target | aim | achieve | align) AND (&#8220;net zero&#8221; | net-zero | ((climate OR carbon) AND (neutral | neutrality)))<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Mitigation targets<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">(announce | commit | pledge | target | aim | achieve) AND (reduce | reduction | cut | slash | decrease | peak) AND (emissions | carbon | GHG)<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Climate investment goals<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">(announce | commit | pledge | dedicate | establish | aim) AND (green | climate | renewable | &#8220;low carbon&#8221; | &#8220;clean energy&#8221; | waste | sustainable | SDG | ESG | adaptation) AND (finance | invest | fund | financing) AND (goal | target | objective)<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Climate investment goals<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">(announce | commit | pledge | dedicate | establish | aim) AND (finance | invest | fund) AND (protection | preservation | restoration | conservation) AND (biodiversity | forest | pollution | water)<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Divestment and exclusion policies<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">(divest | stop | end | exclude | reduce | &#8220;phase out&#8221; | &#8220;phase down&#8221; | quit | divest | &#8220;cut off&#8221;) AND (fossil fuels | coal | oil | gas | methane | unabated | deforestation)<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Integration actions<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">climate AND (action | transition) AND (management | strategy | plan | framework | &#8220;capacity building&#8221; | engagement | disclosure | department | product | offering)<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Integration actions<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">(announce | adopt | set | establish | apply | implement) AND carbon AND (price | tariff | credit)<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Integration actions<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">(assess | report | evaluate | monitor | disclose | integrate | manage | screen) AND climate AND (risk | vulnerability)<\/td>\n        <\/tr>\n    <\/table>\n<\/div>\n\n\n\n<p>However, search results often include \u201cnon-commitment\u201d text snippets that contain some assortment of relevant key words but do not refer to an actual climate commitment. In order to separate text snippets that do reference climate commitments announced by PDBs from these non-commitment text snippets, we trained a text classification large language model (LLM). The model is re-trained from the ClimateBERT <a href=\"https:\/\/huggingface.co\/climatebert\/distilroberta-base-climate-commitment\" target=\"_blank\" rel=\"noreferrer noopener\">model for climate commitments and actions.<\/a> ClimateBERT is an LLM that is developed using the DistilRoBERTA NLP model and trained on over 2 million climate-related paragraphs.<\/p>\n\n\n\n<p>To re-train the ClimateBERT model, we used the Python packages <code>transformers<\/code> and <code>torch<\/code> (PyTorch), which tokenize the search result text content and run it through a deep learning (<a href=\"https:\/\/www.youtube.com\/watch?v=SZorAJ4I-sA\" target=\"_blank\" rel=\"noreferrer noopener\">transformer neural network<\/a>) model training process.<\/p>\n\n\n\n<p>The model is re-trained on a rebalanced dataset that contains a 50-50 split between labeled commitments and non-commitments, across a total of 3066 observations. This training set is sourced from the results of previous climate ambition tracking among public (<a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/public-financial-institutions-climate-commitments-2023-update\/\" target=\"_blank\" rel=\"noreferrer noopener\">CPI 2023<\/a> and <a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/public-financial-institutions-climate-commitments\/\" target=\"_blank\" rel=\"noreferrer noopener\">2022<\/a>) and private (<a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/private-financial-institutions-paris-alignment-commitments-2022-update\/\" target=\"_blank\" rel=\"noreferrer noopener\">CPI 2022<\/a> and <a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/private-financial-institutions-commitments-to-paris-alignment\/\" target=\"_blank\" rel=\"noreferrer noopener\">2021<\/a>) financial institutions. However, since the full results of these previous tracking efforts show a roughly 75-25 split between commitments and non-commitments, rebalancing allows the model to better \u201clearn\u201d the defining features of the minority class (i.e., commitments) than it would if trained on an unbalanced dataset.<\/p>\n\n\n\n<p>These data are randomly separated into training and validation sets (on an 80-20) basis. The model is fine-tuned using a <a href=\"https:\/\/machinelearningmastery.com\/cross-entropy-for-machine-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">cross-entropy<\/a> loss function and <a href=\"https:\/\/machinelearningmastery.com\/adam-optimization-algorithm-for-deep-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">Adam<\/a> optimization. Model performance is evaluated on the basis of precision (% true positives out of all classified positives), recall (% of total positives captured), and F1 score (harmonic mean of both). Accuracy on the validation set is also considered but is less insightful than the aforementioned performance measures, as it does not provide indication of which class (i.e., positive or negative) the model performs better on, which is a key nuance needed to guide model fine-tuning. The latest version of the <a href=\"https:\/\/huggingface.co\/nkc98\/commitments-classification-model\" target=\"_blank\" rel=\"noreferrer noopener\">commitment classification model<\/a> performs with a validation accuracy of 90.23%, 90.08 F1 Score, 91.14 Precision, and 89.81 Recall. See a demonstration of the model in the code snippet below:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\"import transformers\nfrom transformers import pipeline, AutoTokenizer, AutoConfig, AutoModelForSequenceClassification\n\nmodel_name = &quot;nkc98\/commitments-classification-model&quot;\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name)\n\npipe = pipeline(&quot;text-classification&quot;, model=model, tokenizer=tokenizer, device=-1)\n\nresult = pipe('British International Investment accelerates climate finance ... Alongside increasing its delivery of climate finance, BII is committed to Paris alignment and is developing a strategy for reaching net zero at a portfolio ...', padding=True, truncation=True)\nresult\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #F47067\">import<\/span><span style=\"color: #ADBAC7\"> transformers<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F47067\">from<\/span><span style=\"color: #ADBAC7\"> transformers import pipeline, AutoTokenizer, AutoConfig, AutoModelForSequenceClassification<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">model_name = <\/span><span style=\"color: #96D0FF\">&quot;nkc98\/commitments-classification-model&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">tokenizer = AutoTokenizer.from_pretrained(model_name)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">model = AutoModelForSequenceClassification.from_pretrained(model_name)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">pipe = pipeline(<\/span><span style=\"color: #96D0FF\">&quot;text-classification&quot;<\/span><span style=\"color: #ADBAC7\">, model=model, tokenizer=tokenizer, device=-1)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">result = pipe(<\/span><span style=\"color: #96D0FF\">&#39;British International Investment accelerates climate finance ... Alongside increasing its delivery of climate finance, BII is committed to Paris alignment and is developing a strategy for reaching net zero at a portfolio ...&#39;<\/span><span style=\"color: #ADBAC7\">, padding=True, truncation=True)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">result<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>In the example above, running the text snippet \u201c<em>British International Investment accelerates climate finance &#8230; Alongside increasing its delivery of climate finance, BII is committed to Paris alignment and is developing a strategy for reaching net zero at a portfolio &#8230;<\/em>\u201d through the classification model returns a label of \u201cyes\u201d (indicating that the text snippet does indeed correspond to an announced climate commitment) with an associated probability of <strong>99.5%<\/strong>.<\/p>\n\n\n\n<p>Once a text snippet has been classified as a climate commitment, it is then assigned an additional label corresponding to sub-types of commitments within CPI\u2019s <a href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2022\/10\/public-fi-commitments-methodology-brief-oct-2022.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">climate commitment taxonomy<\/a>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Targets.<\/strong> Signaling intent to achieve specific climate-relevant objectives, potentially resulting in engagement and climate finance flows. This dimension tracks both qualitative commitments and quantitative targets adopted to address climate change, such as:\n<ol class=\"wp-block-list\">\n<li>Paris alignment<\/li>\n\n\n\n<li>Mitigation targets\n<ol class=\"wp-block-list\">\n<li>Net zero targets<\/li>\n\n\n\n<li>Carbon neutrality targets<\/li>\n\n\n\n<li>Interim emissions targets<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Climate investment goals<br><br><br><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li><strong>Integration actions.<\/strong> Measures to mainstream climate into PDB decision-making, potentially increasing climate finance flows (or decreasing flows to projects without climate benefits or even negative climate effects). These are qualitative changes to institutional policies, governance, and investment approaches including:\n<ol class=\"wp-block-list\">\n<li>Institutional climate strategies<\/li>\n\n\n\n<li>Exclusion and divestment policies<\/li>\n\n\n\n<li>Counterparty engagement guidelines<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n\n\n\n<p>These sub-type labels are provided by a secondary set of transformer neural network classification models that are derived from the primary commitments model described above. Specifically, manually validated climate commitments text snippets that correspond to each sub-type (i.e., Paris alignment, net zero, climate investment goals) are used to iteratively re-train the primary model so that it \u201clearns\u201d a new task of accurately identifying text snippets that correspond to a particular sub-type of commitment. These secondary models can essentially be understood as estimating the conditional probability that, given a text snippet has already been labeled as a commitment, it can further be categorized within a commitment sub-type e.g., P(net zero | commitment).<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\"# Load commitments model and tokenizer\nmodel_name = &quot;nkc98\/commitments-classification-model&quot;\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name)\n...\ninputs = tokenizer(list(training_df['Text']), return_tensors=&quot;pt&quot;, padding=&quot;max_length&quot;, truncation=True, max_length=128)\n...\nstart_time = datetime.now()\n\n# Training\nfor epoch in range(num_epochs):\n    sequence_classification_model.train()\n    total_loss = 0.0\n    for batch in train_loader:\n        input_ids = batch[0]\n        attention_mask = batch[1]\n        labels = batch[2]\n        optimizer.zero_grad()\n        outputs = sequence_classification_model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)\n        loss = outputs.loss\n        loss.backward()\n        optimizer.step()\n ...\n # Validation\n    sequence_classification_model.eval()\n    val_loss = 0.0\n    with torch.no_grad():\n        for batch in val_loader:\n            input_ids = batch[0]\n            attention_mask = batch[1]\n            labels = batch[2]\n... \" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #ADBAC7\"># Load commitments model and tokenizer<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">model_name <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #96D0FF\">&quot;nkc98\/commitments-classification-model&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">tokenizer <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> AutoTokenizer.<\/span><span style=\"color: #DCBDFB\">from_pretrained<\/span><span style=\"color: #ADBAC7\">(model_name)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">model <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> AutoModelForSequenceClassification.<\/span><span style=\"color: #DCBDFB\">from_pretrained<\/span><span style=\"color: #ADBAC7\">(model_name)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">inputs <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">tokenizer<\/span><span style=\"color: #ADBAC7\">(<\/span><span style=\"color: #DCBDFB\">list<\/span><span style=\"color: #ADBAC7\">(training_df[<\/span><span style=\"color: #96D0FF\">&#39;Text&#39;<\/span><span style=\"color: #ADBAC7\">]), return_tensors<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #96D0FF\">&quot;pt&quot;<\/span><span style=\"color: #ADBAC7\">, padding<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #96D0FF\">&quot;max_length&quot;<\/span><span style=\"color: #ADBAC7\">, truncation<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">True, max_length<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #6CB6FF\">128<\/span><span style=\"color: #ADBAC7\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">start_time <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> datetime.<\/span><span style=\"color: #DCBDFB\">now<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\"># Training<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">for epoch <\/span><span style=\"color: #F47067\">in<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">range<\/span><span style=\"color: #ADBAC7\">(num_epochs):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    sequence_classification_model.<\/span><span style=\"color: #DCBDFB\">train<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    total_loss <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">0.0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    for batch <\/span><span style=\"color: #F47067\">in<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">train_loader<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        input_ids <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> batch[<\/span><span style=\"color: #6CB6FF\">0<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        attention_mask <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> batch[<\/span><span style=\"color: #6CB6FF\">1<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        labels <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> batch[<\/span><span style=\"color: #6CB6FF\">2<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        optimizer.<\/span><span style=\"color: #DCBDFB\">zero_grad<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        outputs <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">sequence_classification_model<\/span><span style=\"color: #ADBAC7\">(input_ids<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">input_ids, attention_mask<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">attention_mask, labels<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">labels)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        loss <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> outputs.loss<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        loss.<\/span><span style=\"color: #DCBDFB\">backward<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        optimizer.<\/span><span style=\"color: #DCBDFB\">step<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\"> # Validation<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    sequence_classification_model.<\/span><span style=\"color: #DCBDFB\">eval<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    val_loss <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">0.0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F47067\">with<\/span><span style=\"color: #ADBAC7\"> torch.<\/span><span style=\"color: #DCBDFB\">no_grad<\/span><span style=\"color: #ADBAC7\">():<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        for batch <\/span><span style=\"color: #F47067\">in<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">val_loader<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">            input_ids <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> batch[<\/span><span style=\"color: #6CB6FF\">0<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">            attention_mask <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> batch[<\/span><span style=\"color: #6CB6FF\">1<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">            labels <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> batch[<\/span><span style=\"color: #6CB6FF\">2<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F47067\">...<\/span><span style=\"color: #ADBAC7\"> <\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Note that secondary labels are not mutually exclusive\u2014for example, the text snippet \u201c<em>British International Investment accelerates climate finance &#8230; Alongside increasing its delivery of climate finance, BII is committed to Paris alignment and is developing a strategy for reaching net zero at a portfolio &#8230;<\/em>\u201d actually corresponds to four overlapping secondary labels: Paris alignment, net zero, carbon neutrality, and institutional climate strategies.<\/p>\n\n\n\n<p>After text snippets are processed through ML text classification, they are returned as a structured data observation in the format below:<\/p>\n\n\n\n<div style=\"overflow-x: auto;\">\n<table style=\"width: 100%; border-collapse: collapse;border: border: 0.5px solid grey;font-family: 'Whitney';font-size: 16px;\">\n     <tr style=\"background-color: #B53C36; color: white;\">\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Text Snippet<\/th>\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Commitment<\/th>\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Paris Alignment<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Net Zero<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Carbon Neutral<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Interim Target<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Investment<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Divestment<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Institutional Strategy<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Counterparty Engagement<\/th>\n\n        <\/tr>\n        <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">British International Investment accelerates climate finance &#8230; Alongside increasing its delivery of climate finance, BII is committed to Paris alignment and is developing a strategy for reaching net zero at a portfolio\u00a0&#8230;<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">True<\/td>\n<td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">True<\/td>\n<td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">True<\/td>\n<td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">False<\/td>\n<td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">False<\/td>\n<td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">True<\/td>\n<td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">False<\/td>\n<td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">True<\/td>\n<td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">False<\/td>\n\n\n        <\/tr>\n   \n    <\/table>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"part-2\">Solution Part 2: Extracting commitment metadata with ChatGPT<\/h3>\n\n\n\n<p>In order to maximize the efficiency of ChatGPT-supported metadata extraction, we incorporated text classification machine learning models, such as neural networks, to subset the dataset to observations that contain the relevant metadata fields. Subsequently, we employed various task optimization techniques to guide ChatGPT\u2019s extraction process. Finally, we used a schema validation script to verify that ChatGPT\u2019s responses align with the expected data type and value ranges.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Step 1: Applying labels to subset extraction tasks<\/h4>\n\n\n\n<p>To streamline and optimize the process of extracting metadata from commitments, we incorporated ChatGPT into the data gathering workflow to complete tasks that otherwise would have been done manually. However, in order to optimize the cost of utilizing ChatGPT [1], we first distinguish between text snippets based on their relevance to each extraction prompt. To complete this pre-processing task, we first tried using a TF-IDF (<a href=\"https:\/\/www.learndatasci.com\/glossary\/tf-idf-term-frequency-inverse-document-frequency\/\" target=\"_blank\" rel=\"noreferrer noopener\">Term Frequency-Inverse Document Frequency<\/a>) vectorizer to categorize unlabeled data for relevancy to specific commitment sub-types; however, after manual validation of the metadata fields, we instead used a set of transformer neural network models (described in <a href=\"#part-1\" target=\"_blank\" rel=\"noreferrer noopener\">Solution Part 1<\/a>) to label commitment types and then pass text snippets to appropriate extraction prompts.<\/p>\n\n\n\n<p>Specifically, after commitments were classified into specific sub-categories such as mitigation targets, climate investment goals, or exclusion and divestment policies, they were then fed to extraction prompts based on the types of metadata values they would likely contain. For example, an interim mitigation target is very probable to contain a percentage carbon or greenhouse gas emissions value, while a commitment to an institutional climate strategy would not be.<\/p>\n\n\n\n<p>In addition to those secondary commitment type labels, we also implemented helper functions to identify if a commitment contains a date, monetary values, or percentages to further subset the data for metadata extraction. Overall, this strategy saved costs associated with API calls by ~78%. Additionally, similar to Retrieval Augment generation <a href=\"https:\/\/arxiv.org\/abs\/2312.10997\" target=\"_blank\" rel=\"noreferrer noopener\">(Gao et al. 2024)<\/a>, this methodology should theoretically reduce the probability of ChatGPT generating a \u201challucination\u201d i.e., fabricating a value that is not actually present in the text snippet.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\"df['contains_percent'] = 0\ndf.loc[df['Text'].str.contains(r'%|percent|per cent', case=False, na=False), 'contains_percent'] = 1\ndf = df[(df['Mitigation'] ==1) &amp; (df['contains_percent'] ==1)]\n_df = prompt.open_ai_prompt(df, pv.commitment_announcement_v4)\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #ADBAC7\">df[<\/span><span style=\"color: #96D0FF\">&#39;contains_percent&#39;<\/span><span style=\"color: #ADBAC7\">] <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">df.loc[df[<\/span><span style=\"color: #96D0FF\">&#39;Text&#39;<\/span><span style=\"color: #ADBAC7\">].str.<\/span><span style=\"color: #DCBDFB\">contains<\/span><span style=\"color: #ADBAC7\">(r<\/span><span style=\"color: #96D0FF\">&#39;%|percent|per cent&#39;<\/span><span style=\"color: #ADBAC7\">, case<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">False, na<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">False), <\/span><span style=\"color: #96D0FF\">&#39;contains_percent&#39;<\/span><span style=\"color: #ADBAC7\">] <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">df <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> df[(df[<\/span><span style=\"color: #96D0FF\">&#39;Mitigation&#39;<\/span><span style=\"color: #ADBAC7\">] <\/span><span style=\"color: #F47067\">==<\/span><span style=\"color: #6CB6FF\">1<\/span><span style=\"color: #ADBAC7\">) <\/span><span style=\"color: #F47067\">&amp;<\/span><span style=\"color: #ADBAC7\"> (df[<\/span><span style=\"color: #96D0FF\">&#39;contains_percent&#39;<\/span><span style=\"color: #ADBAC7\">] <\/span><span style=\"color: #F47067\">==<\/span><span style=\"color: #6CB6FF\">1<\/span><span style=\"color: #ADBAC7\">)]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">_df <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> prompt.<\/span><span style=\"color: #DCBDFB\">open_ai_prompt<\/span><span style=\"color: #ADBAC7\">(df, pv.commitment_announcement_v4)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>In this example, without targeted sub-setting the query would have contained 1,378,582 tokens, but after implementing this helper function strategy, the total token count was reduced to 291,279 tokens, saving about 78% of query costs.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Step 2: ChatGPT supported field extraction<\/h4>\n\n\n\n<p>Within subsets of the total text snippet dataset, ChatGPT is then used to parse relevant metadata values. Unlike the tools used in the previous stage, which were trained on external data to classify commitments, ChatGPT is able to pinpoint relevant information within a text snippet without any prior training due to the advent of named entity recognition and event extraction <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2402.11203\" target=\"_blank\" rel=\"noreferrer noopener\">(Huang, Y. and Huang, J. 2020)<\/a>. As such, use of ChatGPT removed the time and labor-intensive process of manual data labeling and model fine-tuning.<\/p>\n\n\n\n<p>Prior to the release of <code>gpt-4o-mini<\/code>, we implemented <code>gpt-3.5-turbo-0125<\/code> to support metadata extraction due to the significantly lower cost for development and testing. However, with the release of <code>gpt-4o-mini<\/code>, we may switch to the newer and cheaper model in future iterations. However, for the purposes of this report, we will refer to <code>gpt-3.5-turbo-0125<\/code> when discussing methodology. Users can follow the official documentation to make a connection to the OpenAI API [2]. Additionally, we recommend following the official documentation for safe API practices [3].<\/p>\n\n\n\n<p class=\"is-style-default\">In order to properly utilize ChatGPT responses [4], we utilized the parameter <a href=\"https:\/\/cookbook.openai.com\/examples\/structured_outputs_intro\" target=\"_blank\" rel=\"noreferrer noopener\"><code>response_format={\"type\": \"json_object\"}<\/code><\/a> to ensure responses were JSON objects.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\"def response_parse(df):\n\t...\n\tstring = df.loc[index, 'response']\n\t        json_file=json.loads(string)\n\t        for i in json_file:\n\t            df.loc[index,i] = json_file[i]         \n\t...\n\treturn df\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #ADBAC7\">def <\/span><span style=\"color: #DCBDFB\">response_parse<\/span><span style=\"color: #ADBAC7\">(df):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t<\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\tstring <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> df.loc[index, <\/span><span style=\"color: #96D0FF\">&#39;response&#39;<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        json_file<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">json.<\/span><span style=\"color: #DCBDFB\">loads<\/span><span style=\"color: #ADBAC7\">(string)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        for i <\/span><span style=\"color: #F47067\">in<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">json_file<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t            df.loc[index,i] <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> json_file[i]         <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t<\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t<\/span><span style=\"color: #F47067\">return<\/span><span style=\"color: #ADBAC7\"> df<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>During the development of the initial pipeline, we opted for iterating prompts as opposed to building a fine-tuned model. Some evidence suggests that fine-tuned models have a propensity to overfit on training data, based on a theory that these models may exaggerate actual performance on an underlaying task (<a href=\"https:\/\/arxiv.org\/abs\/2005.14165\" target=\"_blank\" rel=\"noreferrer noopener\">Brown et al. 2020)<\/a>. Moreover, given that the extraction task is fairly straightforward and already supported by a number of pre-processing steps (e.g., indexing relevance from commitment type labels and helper functions), we believed that prompt engineering would be more efficient than fine-tuning models for each specific task. However, the balance of tradeoffs in prompt engineering vs fine-tuning can vary by task, other users can find in-depth strategies for their own use case on OpenAI&#8217;s <a href=\"https:\/\/platform.openai.com\/docs\/guides\/prompt-engineering\/six-strategies-for-getting-better-results\" target=\"_blank\" rel=\"noreferrer noopener\">prompt engineering guide<\/a>.<\/p>\n\n\n\n<p>Specifically, we implemented prompt engineering suggestions to \u201csplit complex tasks into simpler subtasks\u201d, \u201cprovide reference text\u201d, \u201ctest changes systematically\u201d, \u201cadopt a persona\u201d, \u201cuse delimiters\u201d, and \u201cprovide examples\u201d. We instructed ChatGPT to adopt an API persona that parses data and returns the responds as JSON object. For prompts themselves, we implemented a \u201cfew-shot\u201d examples approach.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\" ...\n messages=[{&quot;role&quot;: &quot;system&quot;, &quot;content&quot;:(f&quot;&quot;&quot;\n                                        You are an API that &quot;responds only in JSON&quot; for parsing of meta data from text snippets.\n                                        {prompt}\n                                        &quot;&quot;&quot;)},\n            {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: row['Text']}],\n        stream=True,)\n...\npercentage_prompt = &quot;&quot;&quot;\nYou are tasked with identifying percentages within text snippets. \nThese snippets include percent reduction in carbon emissions, ghg emissions, or unspecified emissions;\nOr include percent reduction or increase in finances.\nProvide the values as floats. If there is a range go with the lower number.\nIf there is no data use &quot;NULL&quot;.\nPlease provide the information as a JSON.  \nYour response should look like:\n {&quot;carbon_pct&quot;: &quot;xx.xx&quot;,\n  &quot;ghg_pct&quot;: &quot;xx.xx&quot;,\n  &quot;unknown_emissions_pct&quot; : &quot;xx.xx&quot;,\n  &quot;finance_reduction_pct&quot; : &quot;xx.xx&quot;,\n  &quot;finance_increase_pct&quot; : &quot;xx.xx&quot;,\n  &quot;unknown_finance_pct&quot; : &quot;xx.xx&quot;}\nor\n {&quot;carbon_pct&quot;: &quot;NULL&quot;,\n  &quot;ghg_pct&quot;: &quot;NULL&quot;,\n  &quot;unknown_emissions_pct&quot; : &quot;NULL&quot;,\n  &quot;finance_reduction_pct&quot; : &quot;NULL&quot;,\n  &quot;finance_increase_pct&quot; : &quot;NULL&quot;,\n  &quot;unknown_finance_pct&quot; : &quot;NULL&quot;}\n...\n&quot;&quot;&quot;\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\"> messages<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">[{<\/span><span style=\"color: #96D0FF\">&quot;role&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;system&quot;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&quot;content&quot;<\/span><span style=\"color: #ADBAC7\">:(f<\/span><span style=\"color: #96D0FF\">&quot;&quot;&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #96D0FF\">                                        You are an API that &quot;<\/span><span style=\"color: #ADBAC7\">responds only <\/span><span style=\"color: #F47067\">in<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">JSON<\/span><span style=\"color: #96D0FF\">&quot; for parsing of meta data from text snippets<\/span><span style=\"color: #FF938A; font-style: italic\">.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">                                        {prompt}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">                                        <\/span><span style=\"color: #96D0FF\">&quot;&quot;&quot;)}<\/span><span style=\"color: #FF938A; font-style: italic\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">            {<\/span><span style=\"color: #96D0FF\">&quot;role&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;user&quot;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&quot;content&quot;<\/span><span style=\"color: #ADBAC7\">: row[<\/span><span style=\"color: #96D0FF\">&#39;Text&#39;<\/span><span style=\"color: #ADBAC7\">]}],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        stream<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">True,)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">percentage_prompt <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #96D0FF\">&quot;&quot;&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #96D0FF\">You are tasked with identifying percentages within text snippets.<\/span><span style=\"color: #FF938A; font-style: italic\"> <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">These snippets include percent reduction <\/span><span style=\"color: #F47067\">in<\/span><span style=\"color: #ADBAC7\"> carbon emissions, ghg emissions, or unspecified emissions;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">Or include percent reduction or increase in finances.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">Provide the values <\/span><span style=\"color: #F47067\">as<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">floats<\/span><span style=\"color: #ADBAC7\">. <\/span><span style=\"color: #F69D50\">If<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">there<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">is<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">a<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">range<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">go<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">with<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">the<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">lower<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F69D50\">number<\/span><span style=\"color: #ADBAC7\">.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">If there is no data use <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">Please provide the information as a JSON.  <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">Your response should look like:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\"> {<\/span><span style=\"color: #96D0FF\">&quot;carbon_pct&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;xx.xx&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;ghg_pct&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;xx.xx&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;unknown_emissions_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;xx.xx&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;finance_reduction_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;xx.xx&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;finance_increase_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;xx.xx&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;unknown_finance_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;xx.xx&quot;<\/span><span style=\"color: #ADBAC7\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">or<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\"> {<\/span><span style=\"color: #96D0FF\">&quot;carbon_pct&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;ghg_pct&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;unknown_emissions_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;finance_reduction_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;finance_increase_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;unknown_finance_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #96D0FF\">&quot;&quot;&quot;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>For the prompts themselves, research suggests that few shot examples improve the accuracy of \u201cfill in the blank\u201d tasks by 18% when compared to \u201czero shot\u201d examples (<a href=\"https:\/\/www.cs.ucf.edu\/~lboloni\/Teaching\/CAP5636_Fall2023\/homeworks\/GPT-3.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Tom B. et al. 2020)<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\"&quot;&quot;&quot;\n...\nExample 1:\n&quot;North American Development Bank SUMMARY OF PROJECT ... \n50% and will achieve nearly 24% lower carbon dioxide (CO2) emissions. \nThe reduction in criteria pollutant emissions is even higher for compressed natural ...&quot;\nshould return:\n\n {&quot;carbon_pct&quot;: &quot;24.00&quot;,\n  &quot;ghg_pct&quot;: &quot;NULL&quot;,\n  &quot;unknown_emission_pct&quot; : &quot;50.00&quot;,\n  &quot;finance_reduction_pct&quot; : &quot;NULL&quot;,\n  &quot;finance_increase_pct&quot; : &quot;NULL&quot;,\n  &quot;unknown_finance_pct&quot; : &quot;NULL&quot;}\n\nExample 2:\n&quot;PRIVATE SECTOR DIAGNOSIS (CPSD): CREATING... to 10.5% in 2015 before gradually decreasing to reach 5.5% in 2019,\n... country consists of a reduction in greenhouse gas emissions of 11.14%.. .&quot;\nshould return:\n\n {&quot;carbon_pct&quot;: &quot;NULL&quot;,\n  &quot;ghg_pct&quot;: &quot;11.14&quot;,\n  &quot;unknown_emission_pct&quot; : &quot;10.5&quot;,\n  &quot;finance_reduction_pct&quot; : &quot;NULL&quot;,\n  &quot;finance_increase_pct&quot; : &quot;NULL&quot;,\n  &quot;unknown_finance_pct&quot; : &quot;NULL&quot;}\n&quot;&quot;&quot;\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #96D0FF\">&quot;&quot;&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #96D0FF\">..<\/span><span style=\"color: #FF938A; font-style: italic\">.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">Example <\/span><span style=\"color: #6CB6FF\">1<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #96D0FF\">&quot;North American Development Bank SUMMARY OF PROJECT ...<\/span><span style=\"color: #FF938A; font-style: italic\"> <\/span><\/span>\n<span class=\"line\"><span style=\"color: #6CB6FF\">50<\/span><span style=\"color: #F47067\">%<\/span><span style=\"color: #ADBAC7\"> and will achieve nearly <\/span><span style=\"color: #6CB6FF\">24<\/span><span style=\"color: #F47067\">%<\/span><span style=\"color: #ADBAC7\"> lower carbon <\/span><span style=\"color: #DCBDFB\">dioxide<\/span><span style=\"color: #ADBAC7\"> (<\/span><span style=\"color: #6CB6FF\">CO2<\/span><span style=\"color: #ADBAC7\">) emissions. <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">The reduction <\/span><span style=\"color: #F47067\">in<\/span><span style=\"color: #ADBAC7\"> criteria pollutant emissions is even higher for compressed natural <\/span><span style=\"color: #F47067\">...<\/span><span style=\"color: #96D0FF\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #96D0FF\">should return<\/span><span style=\"color: #FF938A; font-style: italic\">:<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\"> {<\/span><span style=\"color: #96D0FF\">&quot;carbon_pct&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;24.00&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;ghg_pct&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;unknown_emission_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;50.00&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;finance_reduction_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;finance_increase_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;unknown_finance_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">}<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">Example <\/span><span style=\"color: #6CB6FF\">2<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #96D0FF\">&quot;PRIVATE SECTOR DIAGNOSIS (CPSD): CREATING... to 10.5% in 2015 before gradually decreasing to reach 5.5% in 2019<\/span><span style=\"color: #FF938A; font-style: italic\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F47067\">...<\/span><span style=\"color: #ADBAC7\"> country consists <\/span><span style=\"color: #F47067\">of<\/span><span style=\"color: #ADBAC7\"> a reduction <\/span><span style=\"color: #F47067\">in<\/span><span style=\"color: #ADBAC7\"> greenhouse gas emissions <\/span><span style=\"color: #F47067\">of<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">11.14<\/span><span style=\"color: #F47067\">%<\/span><span style=\"color: #ADBAC7\">.. .<\/span><span style=\"color: #96D0FF\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #96D0FF\">should return<\/span><span style=\"color: #FF938A; font-style: italic\">:<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\"> {<\/span><span style=\"color: #96D0FF\">&quot;carbon_pct&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;ghg_pct&quot;<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #96D0FF\">&quot;11.14&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;unknown_emission_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;10.5&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;finance_reduction_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;finance_increase_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #96D0FF\">&quot;unknown_finance_pct&quot;<\/span><span style=\"color: #ADBAC7\"> : <\/span><span style=\"color: #96D0FF\">&quot;NULL&quot;<\/span><span style=\"color: #ADBAC7\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #96D0FF\">&quot;&quot;&quot;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Below is an example of a commitment and the associated response from ChatGPT (in JSON format) to the prompt provided:<\/p>\n\n\n\n<div style=\"overflow-x: auto;\">\n<table style=\"width: 100%; border-collapse: collapse;border: border: 0.5px solid grey;font-family: 'Whitney';font-size: 16px;\">\n     <tr style=\"background-color: #B53C36; color: white;\">\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Commitment<\/th>\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Response<\/th>\n        <\/tr>\n        <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">KBN Green Bond Framework 2024 2030 target to reduce our own emissions by. 55% vs 2019 levels. The previous target set in 2020, was to achieve a 50% reduction by. 2030. KBN&#8217;s greenhouse gas \u2026<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">{&#8220;carbon_pct&#8221;: &#8220;NULL&#8221;,\n&#8220;ghg_pct&#8221;: &#8220;[55.0, 50.0]&#8221;,\n&#8220;unknown_emissions_pct&#8221;: &#8220;NULL&#8221;,\n&#8220;finance_reduction_pct&#8221;: &#8220;NULL&#8221;,\n&#8220;finance_increase_pct&#8221;: &#8220;NULL&#8221;,\n&#8220;unknown_finance_pct&#8221;: &#8220;NULL&#8221;}<\/td>\n        <\/tr>\n   \n    <\/table>\n<\/div>\n\n\n\n<p>The JSON response object is then converted into a data frame as seen below.<\/p>\n\n\n\n<div style=\"overflow-x: auto;\">\n<table style=\"width: 100%; border-collapse: collapse;border: border: 0.5px solid grey;font-family: 'Whitney';font-size: 16px;\">\n     <tr style=\"background-color: #B53C36; color: white;\">\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">text<\/th>\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">carbon_pct<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">ghg_pct<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">unknown_emissions_pct<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">finance_reduction_pct<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">finance_increase_pct<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">unknown_finance_pct<\/th>\n\n        <\/tr>\n        <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">KBN Green Bond Framework 2024 2030 target to reduce our own emissions by. 55% vs 2019 levels. The previous target set in 2020, was to achieve a 50% reduction by. 2030. KBN&#8217;s greenhouse gas \u2026<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">NULL<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">[55.0, 50.0]<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">NULL<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">NULL<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">NULL<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">NULL<\/td>\n\n        <\/tr>\n   \n    <\/table>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Step 3: Finalization of commitments table schema<\/h4>\n\n\n\n<p>To validate the responses of ChatGPT, we used python packages <a href=\"https:\/\/pandera.readthedocs.io\/en\/stable\/\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Pandera<\/code><\/a> and <a href=\"https:\/\/yaml.org\/spec\/1.2.2\/\" target=\"_blank\" rel=\"noreferrer noopener\"><code>YAML<\/code><\/a> to verify datatypes and enforce expected range of values. This iterative process can also be used as a tool to identify edge cases or diagnose issues during the pipeline.<\/p>\n\n\n\n<p>As a first validation step, we incorporated a <a href=\"https:\/\/yaml.org\/spec\/1.2.2\/#chapter-1-introduction-to-yaml\" target=\"_blank\" rel=\"noreferrer noopener\">YAML file<\/a> to coerce data types and acceptable ranges. We opted to use <code>YAML<\/code> due to its ability to be <a href=\"https:\/\/yaml.org\/spec\/1.2.2\/#chapter-2-language-overview\" target=\"_blank\" rel=\"noreferrer noopener\">user readability<\/a>, <a href=\"https:\/\/yaml.org\/spec\/1.2.2\/#dump\" target=\"_blank\" rel=\"noreferrer noopener\">nest structures<\/a>, and <a href=\"https:\/\/yaml.org\/spec\/1.2.2\/#recommended-schemas\" target=\"_blank\" rel=\"noreferrer noopener\">create schemas<\/a> (<a href=\"http:\/\/YAML.org\" target=\"_blank\" rel=\"noreferrer noopener\">YAML 2024<\/a>). To implement schema development, we utilized the python package <a href=\"https:\/\/pandera.readthedocs.io\/en\/stable\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\"><code>Pandera<\/code><\/a> to scaffold the framework of the schema. After generating the \u201cinferred schema\u201d we revised all values to reflect the specification of the project. With <code>infer_schmea<\/code>, the package will coerce the data types of the data frame. To learn more about <code>YAML<\/code> files in <code>Pandera<\/code> developers can <a href=\"https:\/\/pandera.readthedocs.io\/en\/stable\/schema_inference.html#write-to-yaml\" target=\"_blank\" rel=\"noreferrer noopener\">read the documentation<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\"from pandera.io import from_yaml\nimport pandera as pa\n\nschema_inf = pa.infer_schema(df)\nyaml_inf = schema_inf.to_yaml()\nyaml_schema = from_yaml(yaml_inf)\n\nwith open('.\/schema_test.yaml', 'w') as file:\n    file.write(yaml_inf)\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #ADBAC7\">from pandera.io <\/span><span style=\"color: #F47067\">import<\/span><span style=\"color: #ADBAC7\"> from_yaml<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">import pandera <\/span><span style=\"color: #F47067\">as<\/span><span style=\"color: #ADBAC7\"> pa<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">schema_inf = pa.infer_schema(df)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">yaml_inf = schema_inf.to_yaml()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">yaml_schema = from_yaml(yaml_inf)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">with open(<\/span><span style=\"color: #96D0FF\">&#39;.\/schema_test.yaml&#39;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&#39;w&#39;<\/span><span style=\"color: #ADBAC7\">) as file:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    file.write(yaml_inf)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">YAML<\/span><span role=\"button\" tabindex=\"0\" data-code=\"schema_type: dataframe\nversion: 0.20.3\ncolumns:\n\t...\n  text:\n    title: null\n    description: null\n    dtype: str\n    nullable: false\n    checks: null\n    unique: false\n    coerce: true\n    required: true\n    regex: false\n  net_zero:\n    title: null\n    description: null\n    dtype: bool\n    nullable: false\n    checks: null\n    unique: false\n    coerce: true\n    required: true\n    regex: false\n  ...\n  ghg_pct:\n    title: null\n    description: null\n    dtype: float64\n    nullable: true\n    checks:\n      greater_than_or_equal_to: 0.0\n      less_than_or_equal_to: 100.0\n    unique: false\n    coerce: false\n    required: true\n    regex: false\n ...\ndtype: null\ncoerce: true\nstrict: false\nname: null\nordered: false\nunique: null\nreport_duplicates: all\nunique_column_names: false\nadd_missing_columns: false\ntitle: null\ndescription: null\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #F69D50\">schema_type<\/span><span style=\"color: #ADBAC7\">: dataframe<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">version<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">0.20<\/span><span style=\"color: #ADBAC7\">.<\/span><span style=\"color: #6CB6FF\">3<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">columns<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t<\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #F69D50\">text<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">title<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">description<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">dtype<\/span><span style=\"color: #ADBAC7\">: str<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">nullable<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">checks<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">unique<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">coerce<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">true<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">required<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">true<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">regex<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #F69D50\">net_zero<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">title<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">description<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">dtype<\/span><span style=\"color: #ADBAC7\">: bool<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">nullable<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">checks<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">unique<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">coerce<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">true<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">required<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">true<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">regex<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">  <\/span><span style=\"color: #F69D50\">ghg_pct<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">title<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">description<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">dtype<\/span><span style=\"color: #ADBAC7\">: float64<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">nullable<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">true<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">checks<\/span><span style=\"color: #ADBAC7\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">      <\/span><span style=\"color: #F69D50\">greater_than_or_equal_to<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">0.0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">      <\/span><span style=\"color: #F69D50\">less_than_or_equal_to<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">100.0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">unique<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">coerce<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">required<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">true<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F69D50\">regex<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">dtype<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">coerce<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">true<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">strict<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">name<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">ordered<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">unique<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">report_duplicates<\/span><span style=\"color: #ADBAC7\">: all<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">unique_column_names<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">add_missing_columns<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">false<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">title<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F69D50\">description<\/span><span style=\"color: #ADBAC7\">: <\/span><span style=\"color: #6CB6FF\">null<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>After creating and modifying the schema, we used a script to <a href=\"https:\/\/pandera.readthedocs.io\/en\/stable\/lazy_validation.html\" target=\"_blank\" rel=\"noreferrer noopener\">validate the data against the schema structure<\/a>. This function returned all rows (i.e., text snippets) that fail to meet the validation criteria. A few example validation cases are shown in the table below.<\/p>\n\n\n\n<div style=\"overflow-x: auto;\">\n<table style=\"width: 100%; border-collapse: collapse;border: border: 0.5px solid grey;font-family: 'Whitney';font-size: 16px;\">\n     <tr style=\"background-color: #B53C36; color: white;\">\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">schema_context<\/th>\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">column<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">check<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">check_number<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">failure_case<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">index<\/th>\n        <\/tr>\n        <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Column<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">publication_date<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">coerce_dtype(&#8216;datetime64[ns]&#8217;)<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">None<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">6 days ago .<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">429<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Column<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">money_amount<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">coerce_dtype(&#8216;Int64&#8217;)<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">None<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">nan<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">449<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Column<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">money_amount<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">coerce_dtype(&#8216;Int64&#8217;)<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">None<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">300,000,000<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">451<\/td>\n        <\/tr>\n   \n    <\/table>\n<\/div>\n\n\n\n<p>Where ChatGPT metadata extraction returns values flagged as invalid according to the schema, they were corrected programmatically across each failure case. For example, in failure cases where date fields are returned as relative time (e.g., \u201c6 days ago\u201d) instead of absolute time (e.g., \u201cDecember 10, 2024\u201d), they are corrected using the date when the text snippet was web scraped. That is, a failure case date field value of \u201c6 days ago\u201d was corrected using a reference web scraping date of August 22, 2024, to replace the value with August 16, 2024.<\/p>\n\n\n\n<p>After extracted metadata is validated automatically using the <code>YAML<\/code> schema, we conducted a final manual validation to resolve any lingering data quality issues, directly validating responses in Excel and appending corrected values as additional columns. Manual validation also included a review of secondary commitment classification labels (i.e., Paris alignment, net zero, climate investment goals, etc.).<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Step 4: Evaluating performance of AI\/ML solutions<\/h4>\n\n\n\n<p>To understand how well the AI\/ML tools performed, we utilized manually validated metadata values alongside extracted values to produce a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.ConfusionMatrixDisplay.html\" target=\"_blank\" rel=\"noreferrer noopener\">confusion matrix<\/a>, using an additional new subset of text snippets that were collected after initial training\/testing. Accordingly, we evaluated the performance of each of the transformer neural network models and ChatGPT extraction prompts ability to classify and\/or extract text data into correct categories and values by assessing the relative prevalence of true positive and true negative results against false positive and false negative results.<\/p>\n\n\n\n<p>In addition to traditional performance metrics (<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score\" target=\"_blank\" rel=\"noreferrer noopener\">accuracy<\/a>, specificity, <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.precision_score.html#sklearn.metrics.precision_score\" target=\"_blank\" rel=\"noreferrer noopener\">precision<\/a>, <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.recall_score.html#sklearn.metrics.recall_score\" target=\"_blank\" rel=\"noreferrer noopener\">recall<\/a>), we also included both F1 (values range 0-1) and Matthew\u2019s correlation coefficient (MCC; values range -1 to 1) to measure model&#8217;s ability to classify information across all classes (positive and negative). For both metrics, a score of \u201c1\u201d indicates perfect classification. MCC can be especially useful for measuring performance on imbalanced datasets (<a href=\"https:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-019-6413-7#Abs1\" target=\"_blank\" rel=\"noreferrer noopener\">Chicco and Jurman 2020<\/a>), which is particularly relevant to this exercise given that the PDB climate commitments dataset tends to have a low ratio of true positives to true negatives.<\/p>\n\n\n\n<p>Furthermore, unlike F1, MCC incorporates predictions that were accurately predicted as negative, correcting a evaluative bias that occurs when F1 over-indexes model performance on the positive class, but not the other way around. Accordingly, unlike F1, MCC only assesses strong performance when a model accurately predicts both positive and negative classes, regardless of class balance. Finally, we also used <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.classification_report.html\" target=\"_blank\" rel=\"noreferrer noopener\">scikit-learn\u2019s \u201cclassification report&#8221;<\/a> to evaluate performance, which provides macro and weighted averaging, as well as individual evaluations.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Evaluating climate commitment labeling models (i.e., <a href=\"#part-1\">Solution part 1<\/a>)<\/h4>\n\n\n\n<p>As described in Solution Part 1, a set of transformer neural network models were trained to label text snippets as various sub-types of climate commitments. To evaluate how well these more sophisticated models perform against alternatives, we created a baseline model for comparison, which classifies text snippet observations using simple string-matching\u2014i.e., labels are determined only on the basis of certain terms and sequences of words appearing within a given text snippet. Rather than using a comparatively complex baseline model, such as a stratified classifier based on class distribution, string-matching was selected as a baseline model due to its relatively intuitive nature, which allows for interpretability. String-matching terms were selected by using N-gram (i.e., text features of lengths 1 to n) plots specific to each commitment type.<\/p>\n\n\n\n<p>A sample of the N-grams most frequently appearing within net zero and carbon neutrality commitments can be seen below:<\/p>\n\n\n<section class=\"block block-chart is-image\"><div is=\"chart\/image\" class=\"chart-image\">\n\t\t<script type=\"json\/props\">{\n    \"colors\": []\n}<\/script>\n\n\t\t\n\t\t<div element=\"tabs\"><\/div>\n\n\t\t\t\t\t<a class=\"block-chart--image image--link\" href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/N-grams.png\" target=\"_blank\"><div class=\"image--wrap\"><img src='https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/N-grams.png' class=\"image\" alt=\"N-grams\" style=\"max-width:100%\" \/><\/div><\/a><!-- image html = <a class=\"block-chart--image image--link\" href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/N-grams.png\" target=\"_blank\"><div class=\"image--wrap\"><img src='https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/N-grams.png' class=\"image\" alt=\"N-grams\" style=\"max-width:100%\" \/><\/div><\/a>-->\t\t\n\t\t<div element=\"canvas\"><\/div>\n\n\t\t\t\t<group name=\"\">\n\t\t\t<!-- tab -->\t\t\t\n\t\t<\/group>\n\t\t\n\n\t\t\n\t\t\t<\/div><\/section>\n\n\n<p>We use these text features to subset the commitments to observations that contain a sequence of \u201cnet\u201d and \u201czero\u201d while avoiding features that would indicate a commitment belonging to the separate carbon neutral category. This method should theoretically limit the number of misclassified commitments where terms are used interchangeably (e.g. net zero carbon).<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\"def baseline_bools(df):\n\t\t...\n\t  # Initialize all boolean columns to 0\n    bool_cols = [..., 'Net Zero', ...]\n    df[bool_cols] = 0\n\n    ...\n    net_zero_pattern = r'\\bnet[\\s-]+zero\\b' # Identify net-zero commitmenmts\n    zero_carbon_pattern = r'\\bzero[\\s-]+carbon\\b' # Ignore mentions of zero-carbon; these are assumed to be carbon neutral commitmenmts\n    df.loc[(df['Text'].str.contains(net_zero_pattern, case=False, regex=True)) &amp; (~df['Text'].str.contains(zero_carbon_pattern, case=False, regex=True)), 'Net Zero'] = 1\n\t\t...\n\n    return df\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #ADBAC7\">def <\/span><span style=\"color: #DCBDFB\">baseline_bools<\/span><span style=\"color: #ADBAC7\">(df):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t<\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t  # Initialize all boolean columns to <\/span><span style=\"color: #6CB6FF\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    bool_cols <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> [<\/span><span style=\"color: #F47067\">...<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&#39;Net Zero&#39;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #F47067\">...<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    df[bool_cols] <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">0<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    net_zero_pattern <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> r<\/span><span style=\"color: #96D0FF\">&#39;<\/span><span style=\"color: #F47067\">\\b<\/span><span style=\"color: #96D0FF\">net[<\/span><span style=\"color: #F47067\">\\s<\/span><span style=\"color: #96D0FF\">-]+zero<\/span><span style=\"color: #F47067\">\\b<\/span><span style=\"color: #96D0FF\">&#39;<\/span><span style=\"color: #ADBAC7\"> # Identify net<\/span><span style=\"color: #F47067\">-<\/span><span style=\"color: #ADBAC7\">zero commitmenmts<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    zero_carbon_pattern <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> r<\/span><span style=\"color: #96D0FF\">&#39;<\/span><span style=\"color: #F47067\">\\b<\/span><span style=\"color: #96D0FF\">zero[<\/span><span style=\"color: #F47067\">\\s<\/span><span style=\"color: #96D0FF\">-]+carbon<\/span><span style=\"color: #F47067\">\\b<\/span><span style=\"color: #96D0FF\">&#39;<\/span><span style=\"color: #ADBAC7\"> # Ignore mentions <\/span><span style=\"color: #F47067\">of<\/span><span style=\"color: #ADBAC7\"> zero<\/span><span style=\"color: #F47067\">-<\/span><span style=\"color: #ADBAC7\">carbon; these are assumed to be carbon neutral commitmenmts<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    df.loc[(df[<\/span><span style=\"color: #96D0FF\">&#39;Text&#39;<\/span><span style=\"color: #ADBAC7\">].str.<\/span><span style=\"color: #DCBDFB\">contains<\/span><span style=\"color: #ADBAC7\">(net_zero_pattern, case<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">False, regex<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">True)) <\/span><span style=\"color: #F47067\">&amp;<\/span><span style=\"color: #ADBAC7\"> (<\/span><span style=\"color: #F47067\">~<\/span><span style=\"color: #ADBAC7\">df[<\/span><span style=\"color: #96D0FF\">&#39;Text&#39;<\/span><span style=\"color: #ADBAC7\">].str.<\/span><span style=\"color: #DCBDFB\">contains<\/span><span style=\"color: #ADBAC7\">(zero_carbon_pattern, case<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">False, regex<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">True)), <\/span><span style=\"color: #96D0FF\">&#39;Net Zero&#39;<\/span><span style=\"color: #ADBAC7\">] <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t<\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F47067\">return<\/span><span style=\"color: #ADBAC7\"> df<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>When commitments correspond to a narrow and specific set of text features, such as the above \u201cnet zero\u201d example, classifying with string-matching is expected to be more effective. Unlike a neural network, this method avoids training on confounding features that a model can pick up on, thus reducing the probability of overfitting when generalizing the model. However, more verbose (i.e., greater variety of corresponding text features) or vaguely defined commitments will be harder to classify using this simplistic baseline model.<\/p>\n\n\n\n<p>For example, we found that net zero and carbon neutrality commitments tend to be articulated using similar language with overlapping contextual text features, possibly leading to false positive classification by the transformer neural network (i.e., carbon neutrality commitments incorrectly labeled as net zero and vice versa). When evaluating a baseline string-matching model side by side with the transformer neural network, we can identify the prevalence of this and other confounding issues by comparing the respective false positive and false negative rate of each method.<\/p>\n\n\n\n<p>An example of the confusion matrix and classification report results for net zero and climate investment goal classification models are found below. In both cases, transformer neural network performance was compared to the string-matching baseline model mentioned previously.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\"def cm(_df, i):\n\t\t\t...\n\t    df_sm = _df.drop(['Paris Aligned', 'Net Zero', 'Carbon Neutral', 'Interim Target',\n\t       'Mitigation', 'Investment', 'Divestment', 'Institutional Strategy'], axis=1)\n\t    df_sm = baseline_bools(df_sm)\n\t    \n\t\t  y_true_sm = df_sm[f'{i}  True']\n\t    y_pred_sm = df_sm[i]\n\t    cm1 = confusion_matrix(y_true_sm, y_pred_sm)\n\t\n\t    y_true_nn = _df[f'{i}  True']\n\t    y_pred_nn = _df[i]\n\t    cm2 = confusion_matrix(y_true_nn, y_pred_nn)\n\t    fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n\t    ...\n\t\t  metrics = {}\n\t\n\t    for model_name, y_true, y_pred, cm in zip(['String Match', 'Neural Network'], \n\t                                                [y_true_sm, y_true_nn], \n\t                                                [y_pred_sm, y_pred_nn], \n\t                                                [cm1, cm2]):\n\t        tn, fp, fn, tp = cm.ravel()\n\t        accuracy = (cm.diagonal().sum()) \/ cm.sum()\n\t        precision = precision_score(y_true, y_pred, average='weighted')\n\t        recall = recall_score(y_true, y_pred, average='weighted')\n\t        f1 = f1_score(y_true, y_pred, average='weighted')\n\t        specificity = tn \/ (tn + fp) if (tn + fp) &gt; 0 else 0\n\t        mcc = matthews_corrcoef(y_true, y_pred)\n\t\n\t        metrics[model_name] = {'Accuracy': accuracy,\n\t            'Precision': precision,\n\t            'Recall': recall,\n\t            'F1 Score': f1,\n\t            'Specificity': specificity,\n\t            'MCC': mcc}\n\t       ...\n\t      metrics_df = pd.DataFrame(metrics).T  \n\t\t\t  metrics_df.index.name = 'Model'\n\n\t\t    plt.tight_layout()\n\t\t    plt.show()\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #ADBAC7\">def <\/span><span style=\"color: #DCBDFB\">cm<\/span><span style=\"color: #ADBAC7\">(_df, i):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t\t<\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    df_sm <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> _df.<\/span><span style=\"color: #DCBDFB\">drop<\/span><span style=\"color: #ADBAC7\">([<\/span><span style=\"color: #96D0FF\">&#39;Paris Aligned&#39;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&#39;Net Zero&#39;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&#39;Carbon Neutral&#39;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&#39;Interim Target&#39;<\/span><span style=\"color: #ADBAC7\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t       <\/span><span style=\"color: #96D0FF\">&#39;Mitigation&#39;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&#39;Investment&#39;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&#39;Divestment&#39;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&#39;Institutional Strategy&#39;<\/span><span style=\"color: #ADBAC7\">], axis<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #6CB6FF\">1<\/span><span style=\"color: #ADBAC7\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    df_sm <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">baseline_bools<\/span><span style=\"color: #ADBAC7\">(df_sm)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t  y_true_sm <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> df_sm[f<\/span><span style=\"color: #96D0FF\">&#39;{i}  True&#39;<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    y_pred_sm <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> df_sm[i]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    cm1 <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">confusion_matrix<\/span><span style=\"color: #ADBAC7\">(y_true_sm, y_pred_sm)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    y_true_nn <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> _df[f<\/span><span style=\"color: #96D0FF\">&#39;{i}  True&#39;<\/span><span style=\"color: #ADBAC7\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    y_pred_nn <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> _df[i]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    cm2 <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">confusion_matrix<\/span><span style=\"color: #ADBAC7\">(y_true_nn, y_pred_nn)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    fig, axes <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> plt.<\/span><span style=\"color: #DCBDFB\">subplots<\/span><span style=\"color: #ADBAC7\">(<\/span><span style=\"color: #6CB6FF\">1<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #6CB6FF\">2<\/span><span style=\"color: #ADBAC7\">, figsize<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\">(<\/span><span style=\"color: #6CB6FF\">12<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #6CB6FF\">5<\/span><span style=\"color: #ADBAC7\">))<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    <\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t  metrics <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> {}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t    for model_name, y_true, y_pred, cm <\/span><span style=\"color: #F47067\">in<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">zip<\/span><span style=\"color: #ADBAC7\">([<\/span><span style=\"color: #96D0FF\">&#39;String Match&#39;<\/span><span style=\"color: #ADBAC7\">, <\/span><span style=\"color: #96D0FF\">&#39;Neural Network&#39;<\/span><span style=\"color: #ADBAC7\">], <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t                                                [y_true_sm, y_true_nn], <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t                                                [y_pred_sm, y_pred_nn], <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t                                                [cm1, cm2]):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        tn, fp, fn, tp <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> cm.<\/span><span style=\"color: #DCBDFB\">ravel<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        accuracy <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> (cm.<\/span><span style=\"color: #DCBDFB\">diagonal<\/span><span style=\"color: #ADBAC7\">().<\/span><span style=\"color: #DCBDFB\">sum<\/span><span style=\"color: #ADBAC7\">()) <\/span><span style=\"color: #F47067\">\/<\/span><span style=\"color: #ADBAC7\"> cm.<\/span><span style=\"color: #DCBDFB\">sum<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        precision <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">precision_score<\/span><span style=\"color: #ADBAC7\">(y_true, y_pred, average<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #96D0FF\">&#39;weighted&#39;<\/span><span style=\"color: #ADBAC7\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        recall <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">recall_score<\/span><span style=\"color: #ADBAC7\">(y_true, y_pred, average<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #96D0FF\">&#39;weighted&#39;<\/span><span style=\"color: #ADBAC7\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        f1 <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">f1_score<\/span><span style=\"color: #ADBAC7\">(y_true, y_pred, average<\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #96D0FF\">&#39;weighted&#39;<\/span><span style=\"color: #ADBAC7\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        specificity <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> tn <\/span><span style=\"color: #F47067\">\/<\/span><span style=\"color: #ADBAC7\"> (tn <\/span><span style=\"color: #F47067\">+<\/span><span style=\"color: #ADBAC7\"> fp) <\/span><span style=\"color: #F47067\">if<\/span><span style=\"color: #ADBAC7\"> (tn <\/span><span style=\"color: #F47067\">+<\/span><span style=\"color: #ADBAC7\"> fp) <\/span><span style=\"color: #F47067\">&gt;<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">0<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #F47067\">else<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        mcc <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #DCBDFB\">matthews_corrcoef<\/span><span style=\"color: #ADBAC7\">(y_true, y_pred)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t        metrics[model_name] <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> {<\/span><span style=\"color: #96D0FF\">&#39;Accuracy&#39;<\/span><span style=\"color: #ADBAC7\">: accuracy,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t            <\/span><span style=\"color: #96D0FF\">&#39;Precision&#39;<\/span><span style=\"color: #ADBAC7\">: precision,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t            <\/span><span style=\"color: #96D0FF\">&#39;Recall&#39;<\/span><span style=\"color: #ADBAC7\">: recall,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t            <\/span><span style=\"color: #96D0FF\">&#39;F1 Score&#39;<\/span><span style=\"color: #ADBAC7\">: f1,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t            <\/span><span style=\"color: #96D0FF\">&#39;Specificity&#39;<\/span><span style=\"color: #ADBAC7\">: specificity,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t            <\/span><span style=\"color: #96D0FF\">&#39;MCC&#39;<\/span><span style=\"color: #ADBAC7\">: mcc}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t       <\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t      metrics_df <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> pd.<\/span><span style=\"color: #DCBDFB\">DataFrame<\/span><span style=\"color: #ADBAC7\">(metrics).<\/span><span style=\"color: #6CB6FF\">T<\/span><span style=\"color: #ADBAC7\">  <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t\t  metrics_df.index.name <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #96D0FF\">&#39;Model&#39;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t    plt.<\/span><span style=\"color: #DCBDFB\">tight_layout<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t    plt.<\/span><span style=\"color: #DCBDFB\">show<\/span><span style=\"color: #ADBAC7\">()<\/span><\/span><\/code><\/pre><\/div>\n\n\n<section class=\"block block-chart is-image\"><div is=\"chart\/image\" class=\"chart-image\">\n\t\t<script type=\"json\/props\">{\n    \"colors\": []\n}<\/script>\n\n\t\t\n\t\t<div element=\"tabs\"><\/div>\n\n\t\t\t\t\t<a class=\"block-chart--image image--link\" href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/images_Net-Zero_plot.png\" target=\"_blank\"><div class=\"image--wrap\"><img src='https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/images_Net-Zero_plot.png' class=\"image\" alt=\"images_Net-Zero_plot\" style=\"max-width:100%\" \/><\/div><\/a><!-- image html = <a class=\"block-chart--image image--link\" href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/images_Net-Zero_plot.png\" target=\"_blank\"><div class=\"image--wrap\"><img src='https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/images_Net-Zero_plot.png' class=\"image\" alt=\"images_Net-Zero_plot\" style=\"max-width:100%\" \/><\/div><\/a>-->\t\t\n\t\t<div element=\"canvas\"><\/div>\n\n\t\t\t\t<group name=\"\">\n\t\t\t<!-- tab -->\t\t\t\n\t\t<\/group>\n\t\t\n\n\t\t\n\t\t\t<\/div><\/section>\n\n\n<div style=\"overflow-x: auto;\">\n<table style=\"width: 100%; border-collapse: collapse;border: border: 0.5px solid grey;font-family: 'Whitney';font-size: 16px;\">\n     <tr style=\"background-color: #B53C36; color: white;\">\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Model<\/th>\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Accuracy<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Precision<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Recall<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">F1 Score<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Specificity<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">MCC<\/th>\n\n        <\/tr>\n        <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">String Match<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.943705<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.945839<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.943705<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.944643<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.964021<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.708473<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Neural Network<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.849028<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.934560<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.849028<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.873396<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.834951<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.568485<\/td>\n        <\/tr>\n   \n    <\/table>\n<\/div>\n\n\n<section class=\"block block-chart is-image\"><div is=\"chart\/image\" class=\"chart-image\">\n\t\t<script type=\"json\/props\">{\n    \"colors\": []\n}<\/script>\n\n\t\t\n\t\t<div element=\"tabs\"><\/div>\n\n\t\t\t\t\t<a class=\"block-chart--image image--link\" href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/images_Investment_plot.png\" target=\"_blank\"><div class=\"image--wrap\"><img src='https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/images_Investment_plot.png' class=\"image\" alt=\"images_Investment_plot\" style=\"max-width:100%\" \/><\/div><\/a><!-- image html = <a class=\"block-chart--image image--link\" href=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/images_Investment_plot.png\" target=\"_blank\"><div class=\"image--wrap\"><img src='https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2012\/12\/images_Investment_plot.png' class=\"image\" alt=\"images_Investment_plot\" style=\"max-width:100%\" \/><\/div><\/a>-->\t\t\n\t\t<div element=\"canvas\"><\/div>\n\n\t\t\t\t<group name=\"\">\n\t\t\t<!-- tab -->\t\t\t\n\t\t<\/group>\n\t\t\n\n\t\t\n\t\t\t<\/div><\/section>\n\n\n<div style=\"overflow-x: auto;\">\n<table style=\"width: 100%; border-collapse: collapse;border: border: 0.5px solid grey;font-family: 'Whitney';font-size: 16px;\">\n     <tr style=\"background-color: #B53C36; color: white;\">\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Model<\/th>\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Accuracy<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Precision<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Recall<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">F1 Score<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Specificity<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">MCC<\/th>\n\n        <\/tr>\n        <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">String Match<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.718526<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.911170<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.718526<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.788316<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.724384<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.195209<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Neural Network<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.797339<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.941686<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.797339<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.845673<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.789589<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0.396129<\/td>\n        <\/tr>\n   \n    <\/table>\n<\/div>\n\n\n\n<p>Interestingly, performance results were fairly close between the transformer neural network and string-matching classification models, and for some labeling tasks the string-matching model performed better than the transformer neural network. Looking at the performance of the transformer neural network, despite class re-balancing to an even 50-50 distribution, we still observed a tendency to over-predict positive classes while also making fewer false negative classifications when compared to the string-matching baseline model. Notably, for classifying both net zero and climate investment goal commitments, the transformer neural network performs with F1 scores and MCC scores that would be considered \u201c<a href=\"https:\/\/www.activeloop.ai\/resources\/glossary\/matthews-correlation-coefficient-mcc\/\" target=\"_blank\" rel=\"noreferrer noopener\">good<\/a>\u201d or even \u201c<a href=\"https:\/\/www.activeloop.ai\/resources\/glossary\/matthews-correlation-coefficient-mcc\/\" target=\"_blank\" rel=\"noreferrer noopener\">strong<\/a>\u201d by rule of thumb standards for model performance, while the string-matching model performs excellently on net zero commitments but fairly poorly on climate investment goals (MCC score, in particular, is quite low).<\/p>\n\n\n\n<p>In theory, the simplicity of net zero commitments text features allowed the string match to perform better, in that any text snippet containing a combination of \u201cnet\u201d and \u201czero\u201d was very likely to be a net zero commitment. Conversely, the transformer neural network struggled to some degree with identifying a clear text feature signal to train on, and likely overfit on some spurious features of the training text snippets.<\/p>\n\n\n\n<p>On the other hand, for more complex labeling tasks such as identifying climate investment goals (\u201dInvestment\u201d above), the transformer neural network performed better than string matching. A likely explanation is that, due to the complexity and variations of text features across climate investment goals, string matching does not effectively discern between positive and negative classes due to the multiple ways in which climate investment goals can be stated, leading to a high volume of potential edge cases where text snippets containing key features such as \u201cinvestment\u201d or \u201cfinance\u201d do not consistently correspond to positive or negative classes.<\/p>\n\n\n\n<p>Based on these results, we recommend implementing a mixed model testing approach after evaluating the relative complexity of a commitment\u2019s text features, to determine if a simple method such as string matching should be used as opposed to a more complex method such as a neural network. For this particular project, a transformer neural network model (<a href=\"https:\/\/huggingface.co\/climatebert\/distilroberta-base-climate-commitment\" target=\"_blank\" rel=\"noreferrer noopener\">ClimateBERT<\/a>) was readily available to be re-trained on climate commitments data, providing an accessible solution to improve data gathering. In other scenarios where existing models are not available and a model must be developed without a baseline, starting with simpler models, such as a logistic regression, may be an effective strategy.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Evaluating metadata extraction with ChatGPT (i.e., <a href=\"#part-2\">Solution part 2<\/a>)<\/h4>\n\n\n\n<p>In addition to evaluating the secondary labeling of climate commitment types, we created an evaluation function to assess the performance of ChatGPT when extracting key metadata fields (Solution part 2). Performance was assessed on the following metrics: false negatives, false positives, hallucinations, and correct extractions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>In this context, false negatives are observations that ChatGPT has incorrectly identified as <code>Null<\/code> when there is actually a true value included in the text snippet.<\/li>\n\n\n\n<li>False extractions are observations where ChatGPT has mistakenly extracted a different value contained in the text snippet that does not correspond to true value for the targeted metadata field.<\/li>\n\n\n\n<li>Hallucinations are observations where an invented value was inferred by the LLM when no metadata value was actually present (i.e., <code>Null<\/code>).<\/li>\n\n\n\n<li>Finally, correct extractions are observations where ChatGPT has extracted the true values from text snippets.<\/li>\n<\/ul>\n\n\n\n<p>Note that all error cases were corrected during manual validation before data was finalized for analysis. This includes out of sample false negatives, where text snippets were not passed to ChatGPT for metadata extraction due to mislabeling upstream, but were later discovered during manual validation.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2d333d;color:#9eadbd\">Python<\/span><span role=\"button\" tabindex=\"0\" data-code=\"def evaluate_gpt_performance(df):\n    category_lookup = {\n        'Carbon Neutral Announced': ['Carbon Neutral'],\n        'Carbon Neutral Target': ['Carbon Neutral'],\n    ...\n    metrics = {\n        'Column': [],\n        'Accuracy': [],\n        'Total_Observations': [],\n        'Total_Samples': [],\n     ...\n\t   predictions = df[pred_col].fillna('nan').astype(str)\n\t\t ground_truth = df[true_col].fillna('nan').astype(str)\n\t\t hallucination_sample_rate = (hallucinations \/ total_samples \n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t if total_samples &gt; 0 else 0)\n     false_negative_sample_rate = (false_negatives \/ total_samples \n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t     if total_samples &gt; 0 else 0)\n\t\t\t...\" style=\"color:#22272e;display:none;background-color:#adbac7\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"simpleString\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki github-dark-dimmed\" style=\"background-color: #22272e\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #ADBAC7\">def <\/span><span style=\"color: #DCBDFB\">evaluate_gpt_performance<\/span><span style=\"color: #ADBAC7\">(df):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    category_lookup <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> {<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        <\/span><span style=\"color: #96D0FF\">&#39;Carbon Neutral Announced&#39;<\/span><span style=\"color: #ADBAC7\">: [<\/span><span style=\"color: #96D0FF\">&#39;Carbon Neutral&#39;<\/span><span style=\"color: #ADBAC7\">],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        <\/span><span style=\"color: #96D0FF\">&#39;Carbon Neutral Target&#39;<\/span><span style=\"color: #ADBAC7\">: [<\/span><span style=\"color: #96D0FF\">&#39;Carbon Neutral&#39;<\/span><span style=\"color: #ADBAC7\">],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    <\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">    metrics <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> {<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        <\/span><span style=\"color: #96D0FF\">&#39;Column&#39;<\/span><span style=\"color: #ADBAC7\">: [],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        <\/span><span style=\"color: #96D0FF\">&#39;Accuracy&#39;<\/span><span style=\"color: #ADBAC7\">: [],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        <\/span><span style=\"color: #96D0FF\">&#39;Total_Observations&#39;<\/span><span style=\"color: #ADBAC7\">: [],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">        <\/span><span style=\"color: #96D0FF\">&#39;Total_Samples&#39;<\/span><span style=\"color: #ADBAC7\">: [],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">     <\/span><span style=\"color: #F47067\">...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t   predictions <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> df[pred_col].<\/span><span style=\"color: #DCBDFB\">fillna<\/span><span style=\"color: #ADBAC7\">(<\/span><span style=\"color: #96D0FF\">&#39;nan&#39;<\/span><span style=\"color: #ADBAC7\">).<\/span><span style=\"color: #DCBDFB\">astype<\/span><span style=\"color: #ADBAC7\">(str)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t ground_truth <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> df[true_col].<\/span><span style=\"color: #DCBDFB\">fillna<\/span><span style=\"color: #ADBAC7\">(<\/span><span style=\"color: #96D0FF\">&#39;nan&#39;<\/span><span style=\"color: #ADBAC7\">).<\/span><span style=\"color: #DCBDFB\">astype<\/span><span style=\"color: #ADBAC7\">(str)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t hallucination_sample_rate <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> (hallucinations <\/span><span style=\"color: #F47067\">\/<\/span><span style=\"color: #ADBAC7\"> total_samples <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t if total_samples <\/span><span style=\"color: #F47067\">&gt;<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">0<\/span><span style=\"color: #ADBAC7\"> else <\/span><span style=\"color: #6CB6FF\">0<\/span><span style=\"color: #ADBAC7\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">     false_negative_sample_rate <\/span><span style=\"color: #F47067\">=<\/span><span style=\"color: #ADBAC7\"> (false_negatives <\/span><span style=\"color: #F47067\">\/<\/span><span style=\"color: #ADBAC7\"> total_samples <\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t     if total_samples <\/span><span style=\"color: #F47067\">&gt;<\/span><span style=\"color: #ADBAC7\"> <\/span><span style=\"color: #6CB6FF\">0<\/span><span style=\"color: #ADBAC7\"> else <\/span><span style=\"color: #6CB6FF\">0<\/span><span style=\"color: #ADBAC7\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ADBAC7\">\t\t\t<\/span><span style=\"color: #F47067\">...<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>The table below highlights a sample of metadata fields and their evaluations. Samples, in this context, refer to the subset of data fed into ChatGPT. For example, the text snippet sample parsed for \u201ccarbon neutral announced\u201d (i.e., the year in which the commitment was made) values only refers to text snippets where &#8216;mitigation&#8217; and \u2018contains year\u2019 Booleans are equal to one.<\/p>\n\n\n\n<div style=\"overflow-x: auto;\">\n<table style=\"width: 100%; border-collapse: collapse;border: border: 0.5px solid grey;font-family: 'Whitney';font-size: 16px;\">\n     <tr style=\"background-color: #B53C36; color: white;\">\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Metadata Field<\/th>\n            <th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Total Sample Observations<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Correct Extractions<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Sample Accuracy Rate<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Sample False Negatives<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Sample False Extraction<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Sample Hallucinations<\/th>\n<th style=\"padding: 8px; text-align: left;border: 1px solid white;font-weight: 550;\">Out Of Sample False Negatives<\/th>\n        <\/tr>\n        <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Carbon Neutral Announced Date<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">549<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">481<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">87.61%<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">68<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Carbon Neutral Target Date<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">549<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">484<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">88.16%<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">23<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">1<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">41<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">2<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Net Zero Announced Date<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">549<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">469<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">85.43%<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">80<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Net Zero Target Date<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">549<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">479<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">87.25%<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">36<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">5<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">29<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">4<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Carbon Tonnage Reduction<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">779<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">710<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">91.14%<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">7<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">5<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">57<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">0<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">Carbon Percent Reduction<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">222<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">196<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">88.29%<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">14<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">2<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">10<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">1<\/td>\n        <\/tr>\n<tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">GHG Tonnage Reduction<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">549<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">498<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">90.71%<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">3<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">3<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">45<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">2<\/td>\n        <\/tr>\n   <tr>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">GHG Percent Reduction<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">235<\/td>\n  <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">175<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">74.47%<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">48<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">3<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">9<\/td>\n            <td style=\"padding: 8px; text-align: left;border: 1px solid grey;\">2<\/td>\n        <\/tr>\n    <\/table>\n<\/div>\n\n\n\n<p>Overall, these evaluation results suggest that AI\/ML tools developed for this project perform with relatively high application accuracy (i.e., when applied to data outside of the test\/train set) on the labeling and extraction tasks they were used for. However, these results also reveal a few key areas in which the tools can be refined for future iterations of the PDB climate ambition tracking project or other use cases.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Footnotes<\/h2>\n\n\n\n<p>[1] API cost can change overtime as well as become depreciated, so we recommend reviewing available models for the scale of your tasks <a href=\"https:\/\/openai.com\/api\/pricing\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/openai.com\/api\/pricing\/<\/a><\/p>\n\n\n\n<p>[2] CPI used the python to interact with the API, but practitioners can refer to the official documentation for their preferred programming language <a href=\"https:\/\/platform.openai.com\/docs\/api-reference\/authentication\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/platform.openai.com\/docs\/api-reference\/authentication<\/a><\/p>\n\n\n\n<p>[3] CPI suggest following API safety protocols to prevent unauthorized use of API keys <a href=\"https:\/\/help.openai.com\/en\/articles\/5112595-best-practices-for-api-key-safety\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/help.openai.com\/en\/articles\/5112595-best-practices-for-api-key-safety<\/a><\/p>\n\n\n\n<p>[4] Once a connection is made to ChatGPT, practitioners can refer to the official documentation on how to prompt information using their preferred programming language <a href=\"https:\/\/platform.openai.com\/docs\/api-reference\/streaming\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI streaming documentation<\/a>.<\/p>\n\n\n\n<p>[5] Practitioners can utilize various strategies to create reproducible prompts by following OpenAI&#8217;s best practices guide <a href=\"https:\/\/cookbook.openai.com\/examples\/structured_outputs_intro\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/cookbook.openai.com\/examples\/structured_outputs_intro<\/a><\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Acknowledgements <\/h2>\n\n\n\n<p><em>This methodology blog has been reviewed by CPI colleagues Eddie Dilworth, Jake Connolly, and Christ Grant<\/em>. <\/p>\n\n\n\n<p><em>This project is supported by the Sequoia Climate Foundation.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog outlines CPI&#8217;s new methodology utilizing artificial intelligence and machine learning tools to process large primary datasets, leading to deeper analytical insights into PDBs&#8217; climate commitments.<\/p>\n","protected":false},"author":248,"featured_media":84703,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"programs":[],"regions":[],"topics":[],"collaborations":[1906],"class_list":["post-82324","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","collaborations-sequoia-climate-foundation"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Building AI\/ML tools to track public development banks&#039; climate ambition - CPI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/\" \/>\n<meta property=\"og:locale\" content=\"id_ID\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building AI\/ML tools to track public development banks&#039; climate ambition - CPI\" \/>\n<meta property=\"og:description\" content=\"This blog outlines CPI&#039;s new methodology utilizing artificial intelligence and machine learning tools to process large primary datasets, leading to deeper analytical insights into PDBs&#039; climate commitments.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/\" \/>\n<meta property=\"og:site_name\" content=\"CPI\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/ClimatePolicyInitiative\" \/>\n<meta property=\"article:published_time\" content=\"2024-12-18T14:45:45+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-21T23:07:24+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2024\/12\/Depositphotos_664814542_XL-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1707\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Pauline Baudry\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@climatepolicy\" \/>\n<meta name=\"twitter:site\" content=\"@climatepolicy\" \/>\n<meta name=\"twitter:label1\" content=\"Ditulis oleh\" \/>\n\t<meta name=\"twitter:data1\" content=\"Pauline Baudry\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimasi waktu membaca\" \/>\n\t<meta name=\"twitter:data2\" content=\"25 menit\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/\"},\"author\":{\"name\":\"Pauline Baudry\",\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/#\\\/schema\\\/person\\\/1d2d2a1a6f2616cb1ac4cb64193f42eb\"},\"headline\":\"Building AI\\\/ML tools to track public development banks&#8217; climate ambition\",\"datePublished\":\"2024-12-18T14:45:45+00:00\",\"dateModified\":\"2026-04-21T23:07:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/\"},\"wordCount\":5226,\"publisher\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/Depositphotos_664814542_XL-scaled.jpg\",\"articleSection\":[\"Uncategorized\"],\"inLanguage\":\"id\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/\",\"url\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/\",\"name\":\"Building AI\\\/ML tools to track public development banks' climate ambition - CPI\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/Depositphotos_664814542_XL-scaled.jpg\",\"datePublished\":\"2024-12-18T14:45:45+00:00\",\"dateModified\":\"2026-04-21T23:07:24+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/#breadcrumb\"},\"inLanguage\":\"id\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"id\",\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/Depositphotos_664814542_XL-scaled.jpg\",\"contentUrl\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/wp-content\\\/uploads\\\/2024\\\/12\\\/Depositphotos_664814542_XL-scaled.jpg\",\"width\":2560,\"height\":1707},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/id\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building AI\\\/ML tools to track public development banks&#8217; climate ambition\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/#website\",\"url\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/\",\"name\":\"CPI\",\"description\":\"Climate Policy Initiative works to improve the most important energy and land use policies around the world, with a particular focus on finance.\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"id\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/#organization\",\"name\":\"Climate Policy Initiative\",\"url\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"id\",\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/wp-content\\\/uploads\\\/2021\\\/07\\\/CPI_logo_cmyk_transparent.png\",\"contentUrl\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/wp-content\\\/uploads\\\/2021\\\/07\\\/CPI_logo_cmyk_transparent.png\",\"width\":1728,\"height\":720,\"caption\":\"Climate Policy Initiative\"},\"image\":{\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/ClimatePolicyInitiative\",\"https:\\\/\\\/x.com\\\/climatepolicy\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/climate-policy-initiative\\\/?lipi=urn:li:page:d_flagship3_search_srp_all;GvyQ8DliSYaW9eZhdq8RBQ==\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UCE8V0iDgBU8mreZdBegVCcA\",\"https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/Climate_Policy_Initiative\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/pt-br\\\/#\\\/schema\\\/person\\\/1d2d2a1a6f2616cb1ac4cb64193f42eb\",\"name\":\"Pauline Baudry\",\"url\":\"https:\\\/\\\/www.climatepolicyinitiative.org\\\/id\\\/author\\\/pauline-baudry\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building AI\/ML tools to track public development banks' climate ambition - CPI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/","og_locale":"id_ID","og_type":"article","og_title":"Building AI\/ML tools to track public development banks' climate ambition - CPI","og_description":"This blog outlines CPI's new methodology utilizing artificial intelligence and machine learning tools to process large primary datasets, leading to deeper analytical insights into PDBs' climate commitments.","og_url":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/","og_site_name":"CPI","article_publisher":"https:\/\/www.facebook.com\/ClimatePolicyInitiative","article_published_time":"2024-12-18T14:45:45+00:00","article_modified_time":"2026-04-21T23:07:24+00:00","og_image":[{"width":2560,"height":1707,"url":"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2024\/12\/Depositphotos_664814542_XL-scaled.jpg","type":"image\/jpeg"}],"author":"Pauline Baudry","twitter_card":"summary_large_image","twitter_creator":"@climatepolicy","twitter_site":"@climatepolicy","twitter_misc":{"Ditulis oleh":"Pauline Baudry","Estimasi waktu membaca":"25 menit"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/#article","isPartOf":{"@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/"},"author":{"name":"Pauline Baudry","@id":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/#\/schema\/person\/1d2d2a1a6f2616cb1ac4cb64193f42eb"},"headline":"Building AI\/ML tools to track public development banks&#8217; climate ambition","datePublished":"2024-12-18T14:45:45+00:00","dateModified":"2026-04-21T23:07:24+00:00","mainEntityOfPage":{"@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/"},"wordCount":5226,"publisher":{"@id":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/#organization"},"image":{"@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/#primaryimage"},"thumbnailUrl":"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2024\/12\/Depositphotos_664814542_XL-scaled.jpg","articleSection":["Uncategorized"],"inLanguage":"id"},{"@type":"WebPage","@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/","url":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/","name":"Building AI\/ML tools to track public development banks' climate ambition - CPI","isPartOf":{"@id":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/#primaryimage"},"image":{"@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/#primaryimage"},"thumbnailUrl":"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2024\/12\/Depositphotos_664814542_XL-scaled.jpg","datePublished":"2024-12-18T14:45:45+00:00","dateModified":"2026-04-21T23:07:24+00:00","breadcrumb":{"@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/#breadcrumb"},"inLanguage":"id","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/"]}]},{"@type":"ImageObject","inLanguage":"id","@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/#primaryimage","url":"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2024\/12\/Depositphotos_664814542_XL-scaled.jpg","contentUrl":"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2024\/12\/Depositphotos_664814542_XL-scaled.jpg","width":2560,"height":1707},{"@type":"BreadcrumbList","@id":"https:\/\/www.climatepolicyinitiative.org\/building-al-ml-tools-to-track-public-development-banks-climate-ambition\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.climatepolicyinitiative.org\/id\/"},{"@type":"ListItem","position":2,"name":"Building AI\/ML tools to track public development banks&#8217; climate ambition"}]},{"@type":"WebSite","@id":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/#website","url":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/","name":"CPI","description":"Climate Policy Initiative works to improve the most important energy and land use policies around the world, with a particular focus on finance.","publisher":{"@id":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"id"},{"@type":"Organization","@id":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/#organization","name":"Climate Policy Initiative","url":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/","logo":{"@type":"ImageObject","inLanguage":"id","@id":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/#\/schema\/logo\/image\/","url":"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2021\/07\/CPI_logo_cmyk_transparent.png","contentUrl":"https:\/\/www.climatepolicyinitiative.org\/wp-content\/uploads\/2021\/07\/CPI_logo_cmyk_transparent.png","width":1728,"height":720,"caption":"Climate Policy Initiative"},"image":{"@id":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/ClimatePolicyInitiative","https:\/\/x.com\/climatepolicy","https:\/\/www.linkedin.com\/company\/climate-policy-initiative\/?lipi=urn:li:page:d_flagship3_search_srp_all;GvyQ8DliSYaW9eZhdq8RBQ==","https:\/\/www.youtube.com\/channel\/UCE8V0iDgBU8mreZdBegVCcA","https:\/\/en.wikipedia.org\/wiki\/Climate_Policy_Initiative"]},{"@type":"Person","@id":"https:\/\/www.climatepolicyinitiative.org\/pt-br\/#\/schema\/person\/1d2d2a1a6f2616cb1ac4cb64193f42eb","name":"Pauline Baudry","url":"https:\/\/www.climatepolicyinitiative.org\/id\/author\/pauline-baudry\/"}]}},"_links":{"self":[{"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/posts\/82324","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/users\/248"}],"replies":[{"embeddable":true,"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/comments?post=82324"}],"version-history":[{"count":0,"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/posts\/82324\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/media\/84703"}],"wp:attachment":[{"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/media?parent=82324"}],"wp:term":[{"taxonomy":"programs","embeddable":true,"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/programs?post=82324"},{"taxonomy":"regions","embeddable":true,"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/regions?post=82324"},{"taxonomy":"topics","embeddable":true,"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/topics?post=82324"},{"taxonomy":"collaborations","embeddable":true,"href":"https:\/\/www.climatepolicyinitiative.org\/id\/wp-json\/wp\/v2\/collaborations?post=82324"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}