U.S. News Best State Rankings
Poppycock, Hogwash, Fiddlefaddle, Bunkum, or Codswallop?
U.S. News, the same folks who put together those rankings of the “best” colleges and universities, is also ranking states. The 50 United States. They have rankings now, these states.
I can’t. I just can’t. I cannot. In fact, wait,
There we go — this is a Code Purple situation, people. And now, let us delve into my current inability to even.
Literally, just why?
Why do we need rankings of states? I mean, I question many other rankings besides those of states, but let’s stick with states. Why do we need to rank them? Is there something actionable here? Are we supposed to move to the states that are more highly ranked? If so, let’s recognize that most people cannot just up and decide what state to live in, because many of us have constraints, be they financial, legal, familial, or otherwise. The number one ranked state is Utah. You know where the laws do not welcome my very queer family? Utah.
In any case, I got so caught up in the stupidity of the entire endeavor that I decided to throw caution to the wind and ask ChatGPT:
Why would US News decide to rank states? What possible value would this have? Eliminate from your response any reasons that assume that the rankings are actually meaningful. We shouldn’t assume that they are meaningful until it is demonstrated that they are.
ChatGPT gave a long list of reasons. For brevity, I’ll truncate to the first five:
Engagement: Rankings, lists, and comparisons typically garner high levels of reader engagement. People are naturally curious to see where their state ranks, how it compares to others, and whether they agree or disagree with the rankings.
Driving Conversations: Rankings can stimulate discussions and debates among readers, policymakers, and the general public. This can increase the virality of the content as people share, discuss, and critique it.
Content Diversity: Introducing state rankings can diversify the content offering of a media outlet, catering to different reader interests and attracting a broader audience.
Ad Revenue and Traffic: Higher engagement, virality, and repeated visits to see updates or comments can increase website traffic. This, in turn, can lead to increased advertising revenue.
Branding: Positioning themselves as a source that provides state rankings can help US News establish or reinforce their brand as an authoritative voice in evaluations and comparisons, even if the criteria might be subjective.
Ah ok, ChatGPT says that U.S. News ranks states primarily for revenue. It’s easy to forget this fact when faced with content that appears objective. Before we delve into the specific shortcomings of the rankings, let’s first understand the ranking methodology.
U.S. News State Ranking Methodology
If you want, you can read the entire methodology here; props to U.S. News for at least some level of transparency. If you’d rather not dive deep, here’s a summary.
The rankings are based on eight main categories: health care, education, economy, infrastructure, opportunity, fiscal stability, crime & corrections, and natural environment. Each category comprises multiple, specific metrics. For example, the education category includes metrics such the preschool enrollment rate and the approximate percentage of high school graduates who have met benchmarks in the SAT, the ACT or both. Overall, the eight categories comprise 71 metrics.
For each metric, the values for the 50 states are ranked and assigned points, ranging from zero (for the lowest-performing state) to 100 (for the top-performing state). The specifics of point assignments between these extremes aren’t important for our present purposes. Next, the scores of metrics within each category, like those in the education category, are averaged for every state, resulting in a category score which is then ranked. The eight category rankings are subsequently averaged for each state. These final averages are put in order to produce each state’s final ranking.
One last thing. When discussing the concept of averaging above, it’s important to note that these aren’t simple averages. U.S. News uses weighted averages, counting some metrics/categories as more important than others. These relative importances were determined by surveying 70,000 participants, who were asked to give their own ranking of the importance of the various factors.
U.S. News doesn’t make public the actual values of the importance weightings. Pinpointing them extremely precisely is not possible given that ranking inherently results in some loss of information. For example, we lose track of whether the amount by which Utah is allegedly better than Washington is bigger or smaller than the amount by which Washington is allegedly better than Idaho. However, since U.S. News does share each state’s final ranking and its placement in each of the eight categories, math can reverse engineer things to get us reasonable estimates. Using tools from a field called mathematical optimization, here are likely weighting values that I found: health care (16%), education (17%), economy (14%), infrastructure (13%), opportunity (13%), fiscal stability (10%), crime & corrections (8%), and natural environment (9%).
Chew on those weightings. When you interpret the rankings, they carry the assumption that, for instance, the natural environment is only half as important as education. If that weighting doesn’t align with your values, then you should ignore the rankings. But then again, there are so many more reasons you should ignore these rankings, and they are largely related to the curious fact that U.S. News is using 71 pieces of information to rank 50 items.
Overfitting
Let’s talk about the idea of overfitting without diving deep into complex math. If you’ve got a Ph.D. in a quantitative field, don’t get shirty with me; just skip this paragraph. Overfitting happens when a mathematical or statistical model becomes too obsessed with the tiny details of the data it’s built on, rather than recognizing the most important patterns. In the situation of the U.S. News state rankings, since there are more criteria contributing to the rankings (in this case, 71 metrics) than there are items to rank (the 50 U.S. states), the risk of overfitting increases. This means that the rankings could be swayed by insignificant features of the data, producing results that might not hold up elsewhere. To better understand this, imagine an alternate universe where there are more than 50 U.S. states. In a well-fitted model, if we applied the same ranking algorithm used on our original 50 states to rank these additional states, the results would make sense. If the model is overfitted, the rankings would come out looking bizarre. While we lack the details of the U.S. News model and don’t have extra states for a real-world test, the 71-to-50 metric-to-state ratio does hint at a plausible risk of overfitting.
Overfitting can be a challenging topic when you first encounter it, so here’s an example for the uninitiated. Imagine you want to teach a robot chef, named RoboChef, to make the perfect sandwich based on the preferences of the people in your town. You start by inviting ten of your friends over. Each friend tells RoboChef exactly how they like their sandwich, from the type of bread to the exact layering of ingredients.
RoboChef diligently learns from these ten sandwiches. It notices some specifics: Person A likes exactly two lettuce leaves, Person B prefers the mayo on the top slice of bread, Person C likes their sandwich slightly toasted, and so on. Now, RoboChef believes these precise preferences are the ultimate guidelines for the perfect sandwich.
Now you open your sandwich shop. A customer walks in and asks for a sandwich. The sandwich is someone similar to the one Person B likes, so RoboChef makes the Person B sandwich. The customer is pissed off: why is the mayo only on the top slice of bread, they ask?
RoboChef over-learned from the limited data it had. Instead of grasping general sandwich-making principles, it became too fixated on the specific details of each friend’s sandwich preference. If it had learned from hundreds or thousands of sandwich orders, it would have a better understanding of general preferences and variations and would make sandwiches that a wider range of people would enjoy.
By the way, this whole scenario is my literal nightmare, not just because of overfitting, but because mayo is absolutely disgusting. Why would you eat that?
Anyway, in summary, overfitting is like studying for a test by memorizing the answers to specific questions rather than understanding the general principles. If the questions change even slightly on the actual test, the person who memorized might fail, while someone who understood the principles can adapt and answer correctly.
Redundancy
With 71 metrics used to rank 50 items, there’s some danger that those metrics are redundant. For instance, the U.S. News ranking includes the metrics
- 2-Year-College Graduation Rate: The share of students attending public institutions who complete a two-year degree program within three years, or 150% of the normal time. (National Center for Education Statistics; 2018 cohort)
- 4-Year-College Graduation Rate: The share of undergraduate students at public institutions who initially pursue a bachelor’s or equivalent four-year degree and receive one within six years, or 150% of the normal time of study. (National Center for Education Statistics; 2015 cohort)
- Population With Degree: The share of people 25 and older in a state who have an associate degree or higher. (U.S. Census Bureau American Community Survey 1-year estimates; 2021)
Do we think these are all independent quantities? I would think that graduation rates at institutions change on a pretty slow time scale, so the share of people 25 and older who have an associate’s degree or higher is probably pretty related to the 2-year-college and 4-year-college graduation rates.
As another example, we have:
- Population Without Health Insurance: The percentage of adults ages 19 to 64 who reported having no health insurance coverage. (U.S. Census Bureau American Community Survey 1-year estimates; 2021)
- Adults Without Wellness Visit: The age-adjusted percentage of adults who reported they had not visited a doctor for a routine checkup within the past year. (Centers for Disease Control and Prevention, Behavioral Risk Factor Surveillance System; 2021)
I would also guess that these are highly correlated. People without health insurance are probably less likely to go for a wellness visit. We can quibble about the degree of correlation, but the point is that with 71 metrics, it’s quite likely that there is a lot of redundancy built in.
The problem with redundancy is that you might be counting the same type of information multiple times, artificially inflating its importance compared to how you intend to. Let’s break this down with an illustrative example. Imagine you’re a teacher aiming to evaluate your students based on their comprehensive skills in math, reading, science, and history. You decide to give six tests:
- A math test on basic arithmetic
- A math test on geometry
- A math test that covers word problems involving basic arithmetic and geometry
- A reading comprehension test
- A science test on basic biology facts
- A history test on the Renaissance period
After grading, you notice that Student A excels in the three math tests but performs average on the reading, science, and history tests. At first glance, the three math tests appear distinct-one focuses on arithmetic, another on geometry, and the third integrates the two through word problems. However, the underlying thread connecting them is mathematical reasoning.
If you were to create an aggregate score for your students based on these tests, Student A’s skill in mathematical reasoning would be triple-counted. This means that, out of the six tests, nearly half of their overall assessment is influenced by their strength in math. Consequently, the combined score would suggest that Student A has a well-rounded mastery over math, reading, science, and history, even though, in reality, their proficiency predominantly lies in math. Other subjects, like reading, science, and history, are underrepresented in the assessment, leading to a skewed perception of Student A’s overall academic capabilities.
This redundancy creates an imbalance in the weight given to different subjects. By giving too much emphasis on one subject area (in this case, math), you unintentionally reduce the significance of other crucial areas, thereby failing to capture a holistic view.
Regarding the U.S. News rankings, our lack of access to the detailed data for all 71 metrics means we can’t definitively gauge the extent of redundancy. But unless each metric is uniformly inflated, which is unlikely, the issue of overemphasis remains a pressing concern.
Complexity
When it comes to rankings, especially ones that are touted to the general public, clarity and interpretability are key. U.S. News begins with 71 metrics to determine the rankings of the states and puts those metrics through various stages of weighting, scoring, and averaging. That’s a dense and multi-layered system of evaluation. You might argue that this is the beauty of the system; it takes a large amount of data and distills it into a single, easily digestible ranking. However, this reductionist approach can be a weakness.
Consider a hypothetical: Florida ranks 10th while New York stands at 20th. The immediate question that arises for me is, “Why?” Given the 71 metrics in play, pinpointing the precise reasons for these rankings is essentially impossible. It’s true that one can see a state’s standing in the eight primary categories. But what of the myriad metrics nested within these categories? What specific cocktail of metrics gave Florida the edge over New York?
Feeling frustrated about the complexity of the U.S. News system, I devised my own, more transparent ranking system. Let me be clear: I’m not asserting the superiority or even the utility of my system. Quite the contrary: I hate my system almost as much as the U.S. News system! But its primary relative advantage is its simplicity.
Here’s my methodology. For each state, apply the formula:
190–1.7 x (GDP Growth Rate) — 0.00057 x (GDP Per Capita) — 0.26 x (Rate of Migration Into State) + 7.4 x (Unemployment Rate) + 0.023 x (Violent Crime Rate) — 1.8 x (Life Expectancy)
Rank the states based on the resulting values (with the lowest number denoting the “best” state) and voilà, you have your ranking. The specifics of how I derived this formula, along with the exact units of measurement for the variables, are a tale for another day. But I’m open to sharing if prompted. The essence is that my model — while terrible! — offers a more transparent alternative to the U.S. News methodology and yet produces similar results. To validate this claim, consider the so-called Spearman correlation coefficient between my rankings and U.S. News’s: a robust 0.91. For those seeking a different, more intuitive metric: of the 1225 possible pairings from 50 states, my rankings agree with U.S. News’s in terms of relative state positioning 94% of the time.
To reiterate, I don’t stand behind my ridiculous formula for many reasons, including the fact that like the U.S. News algorithm, it has variables that probably embody some redundancy. But in comparison, my own ridiculous system provides a clearer understanding of why a state is ranked the way it is.
Conclusion
Ranking states is an endeavor that might initially seem objective, quantifiable, and grounded in clear data. But the U.S. News algorithm, which in my opinion is needlessly complicated, inherently undermines and obscures the unique challenges, histories, and cultural identities of each state… all for a click-worthy list. The rankings might inadvertently perpetuate systemic biases, valuing certain state achievements over the distinct, nuanced struggles and successes of others. What’s more, when we consider the lived realities of citizens — the financial, legal, familial, and cultural constraints that influence their lives and mobility — any even implicit notion of urging people to flock to “better-ranked” states becomes problematic. It not only simplifies multifaceted socio-economic challenges but can also perpetuate stereotypes and misperceptions. We must challenge the premise of such rankings and ask ourselves: Is this truly reflective of a state’s value, potential, and the aspirations of its residents?
Your neighbor,
Chad