Inside the Georgia Trump Indictments: A Data Dive

Chad M. Topaz
4 min readAug 15, 2023

--

Key themes and computational insights

Last night, Donald Trump and 18 of his allies were indicted in Georgia, accused of attempts to overturn his 2020 election defeat. Prosecutors employed a statute usually associated with organized crime to label their actions as a “criminal enterprise.” The indictment reveals multiple efforts by Trump and his associates to challenge his electoral loss. These efforts include pressuring Georgia’s Secretary of State and an alleged scheme where a Trump lawyer tried to access voting machines to extract data. Fani Willis, the Fulton County District Attorney spearheading the case, highlighted their purported criminal racketeering strategies. Notable defendants include ex-White House chief of staff Mark Meadows and former NYC Mayor Rudy Giuliani. Following the Jan. 6, 2021 Capitol riot, this indictment is part of a series designed to hold Trump and his inner circle accountable, framing them as members of a criminal syndicate active across multiple states.

A week ago today, I wrote about using data-driven methods to understand the written opinions of Tanya Chutkan, the presiding judge in the Jan. 6 federal case against Trump. I introduced Natural Language Processing (NLP) — a powerful set of computational techniques designed for linguistic analysis. Today, we’ll harness some of these NLP tools to dissect the 98-page Trump indictment in Georgia, which contains 41 distinct counts.

We’ll take a brief look at word clouds and sentiment analysis. And I truly mean a brief look, because — say it with me — this is a Substack, not a research journal.

Word Cloud (Most Frequent Text)

A word cloud visually represents text data, with frequently appearing words shown larger and more prominently. This visualization offers an immediate sense of the main themes or topics in a document by emphasizing recurrent words. It provides an efficient way to understand the central ideas or sentiments in extensive texts without exhaustive reading. Below is a word cloud derived from the 41 indictments, after filtering out commonly used words that lack specific contextual significance, such as “a,” “the,” and “because.”

The word cloud’s revelations might echo familiar narratives for news aficionados, yet it remains an apt summarization. Terms like “presidential,” “election,” “votes,” “electors,” and “electoral” unambiguously point to the electoral process. Names such as “trump,” “donald,” “giuliani,” and “rudolph” highlight the ex-president and his aide. Month references like “december,” “november,” and “january” further paint the timeline around the contentious 2020 election and its aftermath.

Legal nuances are unmistakable, evidenced by words like “conspiracy,” “ocga” (the Official Code of Georgia Annotated), “unlawfully,” “violation,” “jury,” “law,” “unindicted,” and “offense.” Coupled with electoral references, it paints a picture of electoral disputes. Expressions like “false,” “statements,” “document,” and “stated” gesture towards debates surrounding the integrity of statements or documents, potentially hinting at allegations of misinformation or deception.

Sentiment Analysis (Positive or Negative Emotion)

Sentiment analysis seeks to discern the emotional undertone behind words. Picture yourself perusing a restaurant review; intuitively, you gauge if the critique is positive or negative based on word choices. Sentiment analysis emulates this intuition but at a large scale, leveraging computational algorithms instead of human cognition.

I performed sentiment analysis on each of the 41 counts within the indictment. Each count receives a score that spans from -1 (intensely negative) to +1 (resoundingly positive). The histogram below showcases the distribution of the 41 scores.

These scores hover close to neutrality, an anticipated trait for legal indictments. It’s crucial for such documents to exude neutrality, devoid of emotional colorings.

The scores are so close to zero that it’s not clear that differences between them are meaningful, but nonetheless, we can zoom in and visualize the five most positive and five most negative scores, along with their indictment numbers.

I am not well-qualified to judge the raw text of these indictments. I am eager to hear from legal experts who might be able to tell us if, for instance, Indictment 38 does read more positively than Indictment 32.

In Conclusion

Tools such as word clouds and sentiment analysis provide a data-driven overview of the 98-page Trump indictment from Georgia. While word clouds effectively distill some of the main themes and figures of the document, sentiment analysis verifies the expected neutral tone of a legal indictment. The uniformity in sentiment scores, hovering close to zero, reinforces the notion of legal documents aiming for impartiality. However, subtle variations, as noted, for instance, between Indictments 38 and 32, prompt curiosity. Perhaps these fluctuations, though minor, hold significance in the legal sphere-a subject that warrants further exploration with experts well-versed in legal terminology and intent. Engaging with this indictment through computational tools underscores the power of Natural Language Processing in breaking down and understanding complex documents, possibly offering insights that complement a human reading.

Your neighbor,

Chad

--

--

Chad M. Topaz
Chad M. Topaz

Written by Chad M. Topaz

Data Scientist | Social Justice Activist | Professor | Speaker | Nonprofit Leader

No responses yet