Mean Justice

Chad M. Topaz
5 min readAug 9, 2023

--

Averages, medians, and quantitative literacy in the courtroom

Today, I’m delving into quantitative reasoning as it applies to law. Let’s start by setting a few things straight. First, while I’ll be touching on numbers, you won’t need a math PhD to follow along (though if you have one, I still think you should read this post). Second, this isn’t about venting on critics but rather, about offering insight into why numeracy (the numbers version of literacy) matters. And finally, I’m not out to condemn the entire legal profession. I get it: #notalllawyers and #notalljudges.

The spark for this conversation? The unintentional blockbuster court case of the year: Mean v. Median.

Motivating Backstory

Several weeks ago, my collaborators and I released the report, Cost of Discretion: Judicial Decision-Making, Pretrial Detention, and Public Safety in New York City. At its core, the report dissects the societal impact of pretrial detentions and offers valuable insights to the ongoing bail reform discussions in New York. Our findings spotlighted alarming inconsistencies among NYC judges. Some demonstrated a consistent lean towards pretrial detentions, even after adjusting for variables like case severity and defendants’ previous run-ins with the law. To put it in numbers, the decisions of merely fourteen judges over a span of 2.5 years resulted in an extra 580 individuals detained, 154 more years of pretrial confinement, and a jaw-dropping additional $77 million burden on NYC taxpayers.

Our revelations have not gone unnoticed. There’s been a surge of reactions, especially from judges hesitant to be under the magnifying glass. Strangely, one contentious point among critics is our usage of averages (means) instead of medians. I did not see this coming. One particularly vivid example comes from a collective response from 12 judicial groups, which said:

Additionally, as the authors acknowledge, the 97-day period was the average-not median-length of pretrial detention for defendants discharged in 2021. The authors do not explain why they assume that every “Estimated Additional [Person] Detained Due to Disproportionate Carcerality” is detained for that average period.

Similarly, a letter from the former president of the New York State Bar Association complains:

As far as the report is concerned, a remanded defendant and a defendant who makes $100 bail at the day of arraignment will both be deemed detained for 97 days pretrial. That is absurd.

Given these remarks, we’re left with two ways to interpret the criticism:

  1. These critiques aim to divert attention from the pressing issue of certain judges’ excessive inclination towards incarceration.
  2. The comments stem from a genuine lack of understanding of statistical reasoning within the legal community.

Neither scenario paints a comforting picture. Regardless, it’s imperative the public isn’t swayed by misunderstandings or misinterpretations. Hence, let’s dive into a bit of a math lesson today.

Means and Medians

Understanding the difference between the mean (often referred to as the average) and the median is fundamental to statistical literacy. Both represent measures of the middle of a dataset, but they capture this middle differently, yielding distinct insights and potential interpretations.

  • The mean is found by summing all the values and dividing by the number of entries.
  • The median, on the other hand, is the central value in an ordered dataset. When there’s an even number of entries, the median is the mean of the two middle numbers.

It’s crucial to note that while the mean can be swayed by outliers, the median remains stable. For example, boosting the highest value by 20% wouldn’t budge the median. Thus, in some analyses, relying on the median might be more appropriate. Yet, there are situations where the mean takes precedence, and there are (at least) two reasons for this:

Summative Comparison: As I’ll illustrate with some examples, the mean is the preferred choice for summative comparisons — when assessing the collective impact of a set of numbers. This approach aligns with our research on carceral judges.

No Assumption of Uniformity: A common misconception about the mean is that it suggests every value in the dataset equals this average. This misinterpretation is evident in the critiques I mentioned above. Employing the mean doesn’t imply uniformity across values. Instead, it offers a “balance point” where the total disparity between the mean and all lesser numbers matches the total disparity between the mean and all higher numbers.

Examples

Rainfall. City A experienced the following annual rainfall (in inches) over five years: 15, 17, 18, 19, 21. Relatively steady, right? City B’s rainfall, meanwhile, looked like this: 10, 22, 23, 24, 25. A bit more varied.

Average it out, and City A had an annual mean rainfall of 18 inches. City B, with its ups and downs, averaged 20.8 inches a year. We can subtract these averages and multiply by five years to get a total difference of 2.8 inches x 5 = 14 inches, which we can see is correct from the original numbers. Importantly, using the averages doesn’t suggest every year had that exact average rainfall.

However, if you consider the median (or the middle value in each list), City A’s remains 18 inches, but City B’s jumps to 23 inches. Using the difference between the medians — 5 inches — would imply an incorrect 25-inch difference over the five years.

Community Service. Suppose in a town, over one month, ten volunteers put in these community service hours: 2, 3, 4, 5, 6, 6, 7, 8, 9, and one particularly enthusiastic individual contributed 40 hours. The mean number of hours is 9, and the median is 6. If the town wanted to acknowledge the overall contribution, the total using the mean would correctly give 90 hours (9 hrs. average x 10 volunteers). Using the median, however, would lead to an incorrect result of 60 hours (6 hrs. median x 10 volunteers), which underestimates the total effort.

Once again, notice the two main points: (1) we’re interested in summing things up, and (2) using the mean provides a correct answer without the assumption that all the data points are the same.

Quantitative Reasoning in the Legal System

The sobering example of the Sally Clark case in the UK reminds us of why quantitative reasoning skills matter. Here, a statistical misinterpretation led to Sally Clark’s wrongful conviction for the deaths of her children, who tragically passed from natural causes. The jury was heavily swayed by a misrepresented probability value, touted as 1 in 73 million. Had there been a clear understanding of the statistics, this grave error might have been avoided.

The repercussions of weak quantitative reasoning in the legal arena are vast. Individuals might be unjustly incarcerated, families disrupted, and communities erroneously stigmatized. Imagine skewed crime statistics justifying biased policing, or misrepresented economic data influencing resource allocation decisions in underprivileged areas. Such decisions can reinforce systemic inequities.

In all realms, from evaluating judicial decisions to wider societal contexts, precise and transparent quantitative reasoning is paramount. It acts as a safeguard, ensuring decisions are grounded in accurate and comprehensible data, ultimately fostering a just and equitable society.

Your neighbor,

Chad

--

--

Chad M. Topaz
Chad M. Topaz

Written by Chad M. Topaz

Data Scientist | Social Justice Activist | Professor | Speaker | Nonprofit Leader

No responses yet