College Admissions, Legacy Preference, and Race

Chad M. Topaz
8 min readSep 7, 2023

--

A Data Dive

Note: Four amazing Williams College students — Emily Axelrod, Rijul Jain, Lily Napach, and Alessa Somer — conceived this project, gathered the data, and did preliminary analysis as part of their coursework in my Mathematical and Computational Approaches to Social Justice course during spring 2022.

Recently, the Supreme Court ruled against affirmative action in both public and private colleges. They found that race-conscious admissions at institutions like Harvard and the University of North Carolina violate the Fourteenth Amendment’s Equal Protection Clause. Separately, in the wake of this ruling, the U.S. Department of Education initiated a civil rights investigation into Harvard’s legacy admissions practice. This move comes after a complaint by three Boston-based organizations — the Chica Project, the African Community Economic Development of New England, and the Greater Boston Latino Network. These groups claim Harvard’s legacy admissions system discriminates against Black, Hispanic, Asian, and other non-white applicants in favor of those with ties to donors or alumni.

Today, we’ll explore the potential impact of removing legacy admissions on campus demographics.

Legacy Admissions

Legacy admissions give preference to students with familial ties to alumni, granting them a significant advantage. At 75% of the top 100 universities in the U.S., legacy candidates have a preferential admissions edge. While policies vary across institutions, at the most elite schools, those with legacy status have a 45% higher chance of admission compared to their non-legacy counterparts. At Harvard, 43% of legacy admits are white, whereas merely 16% are underrepresented minorities such as African American, Asian American, and Hispanic students.

For solid reasons of confidentiality, we can’t access detailed reasons for individual admissions. We can, however, explore the relationship between a school’s stated admissions priorities — including the weight it puts on legacy status — and the racial demographics of the student body.

Yet Another Quest for Data

If you are a regular reader of this Medium blog, you’re familiar with the data quests my students and I often go on. Here is yet another one!

Our targets of study are the top 25 universities and top 25 colleges according to the US News and World Report rankings from 2022. Ideally, we would like to obtain information about admissions priorities and student body racial demographics. Fortunately, this data is captured in the Common Data Set (CDS). The CDS, a joint initiative between colleges and major publishers like the College Board, seeks to streamline and enhance the transparency of information about institutions of higher learning.

While the “Common” in its name suggests easy accessibility, acquiring this data is anything but. The dataset is standardized, but frustratingly, there’s no centralized repository. Many institutions post their own CDS on their websites, but it’s often tucked away in cumbersome .pdfs, hindering direct analysis ( here’s an example). The student researchers wrote painstaking and ingenious code to parse these .pdfs into structured data. Of the 50 schools we targeted, some appeared not to participate in the CDS process, or at least, not to post their CDS documents online. In the end, we had data for 40 schools and we gathered it for the years 2016–2019. We did not include data from 2020 and later in case it introduced anomalies due to COVID. Of the 40 schools, we had data for all 4 years for 35 of them, and data for 2 or 3 years for the other 5. Regardless, we treat each year/school combination as a separate observation and overall we have 152 observations.

Each observation encompassed two key CDS-derived elements: admissions priorities and racial demographics. Admissions priorities are rated on a four-tiered scale from Not Considered to Very Important for 19 factors:

  • Rigor of secondary school record
  • Academic GPA
  • Standardized test scores
  • Application essay
  • Recommendations
  • Extracurricular activities
  • Talent/ability
  • Character/personal qualities
  • First generation
  • Alumni/ae relation (essentially, legacy)
  • Geographical residence
  • State residency
  • Religious affiliation/commitment
  • Racial/ethnic status (a nod to pre-Supreme Court ruling on affirmative action)
  • Volunteer work
  • Work experience
  • Level of applicant’s interest

To standardize data across schools, we assigned each rating a numerical value (0 for Not Considered to 3 for Very Important). We then normalized these values to ensure each school’s total importance points summed to one.

On the demographic front, our primary concern was domestic underrepresented minority (URM) students, classified as Black, Native American, Native Hawaiian/Alaska Native, and Latinx. While this mirrors definitions by entities like the National Science Foundation, it’s crucial to recognize that Asian American students still face marginalization and oppression.

With this data in hand, our mission is: describe the proportion of URM students on a campus as a function of the 19 admissions variables.

I used a statistical modeling framework called linear regression to create this description. Read the rest of this paragraph only if you are a statistician! If you are a statistician, maybe you want to know that the r-squared value is 0.64 and the adjusted r-squared is 0.58, indicating some, but not severe, overfitting. Maybe you want to know that I used Stata-type (HC1) robust standard errors in my calculations. Maybe you want to argue about whether beta regression would instead be the right tool, since our outcome variable is a proportion. Maybe you want to know that I actually did the beta regression as well, and the model diagnostics do look better, but because the linear regression results are more straightforward to interpret, we’re going with those. In any case, I don’t want to quibble about these things because — say it with me people — this is a Medium post, not a research journal.

Descriptive vs. Predictive Analytics

There’s a subtle but crucial difference between descriptive and predictive analysis, and we’ll do a little of each. Our descriptive analysis aims to capture and explain patterns and relationships in the data we have, showing how variables relate to one another. In contrast, predictive analysis uses past data to forecast future outcomes. However, these predictions have limitations. The future is not just an extension of the past, and external factors or changes in the education landscape can influence outcomes in ways our model might not capture. Thus, while our predictions provide some sense of what might happen, they should be seen as one of many possible futures and interpreted alongside other contextual information

Which Admissions Factors Are Significant

In our model, given the data at hand and the assumptions of our modeling framework, five admissions factors have a significant association with a higher presence of underrepresented minority (URM) students. The first two factors that stand out are:

  • Academic GPA
  • Application essay

The positive association of these factors with increased URM enrollment could indicate that when institutions prioritize these metrics, they see a more diverse student body. This could highlight that many URM students excel in these areas, potentially offsetting other traditional indicators of privilege, such as legacy connections or a breadth of extracurricular experiences.

The remaining three factors that stand out are:

  • Talent/ability
  • First generation
  • Religious affiliation/commitment

Acknowledging a student’s distinct talents and abilities might create avenues for URM students who shine in areas that might be non-traditional or not widely recognized. Placing a premium on first-generation students could directly benefit URM students, some of whom might be trailblazers in their families in the pursuit of higher education. Lastly, emphasizing religious affiliation or commitment could inadvertently favor URM groups, especially if these institutions have historical or community connections to specific religious or ethnic communities.

It’s pivotal, however, to understand that correlation does not necessarily mean causation. While these factors are associated with higher URM representation, they may not be the direct catalysts. Other underlying factors or institutional policies could also be influential.

On the flip side, there are three factors that show up as being significantly negatively associated with URM enrollment:

  • Rigor of secondary school record
  • State residency
  • Alumni/ae relation (legacy)

The first item could imply that when institutions place a heavy emphasis on the prestige or rigor of a student’s previous schooling, they might inadvertently disadvantage URM students, who may not have had equal opportunities to attend such schools. The second item could arise if specific states have lower URM populations, or if there are historical or systemic factors within certain states that reduce college attendance among URM groups.

Finally, the negative association with legacy considerations perhaps shows the historical disparities in college attendance; earlier generations of minority students may have faced barriers to attend these institutions, resulting in fewer URM legacies today.

What Might Happen Without Legacy

We use our statistical model to probe into counterfactual scenarios. Specifically, what would our model predict for each school’s URM student proportion if legacy considerations in admissions were eliminated? The graph below illustrates the results for the year 2019.

The actual URM proportions for undergraduate degree-seeking students at each institution are represented by black dots. The green bars, on the other hand, visualize the model’s predicted URM proportions under the hypothetical scenario where legacy is not a factor in admissions.

When interpreting the results from our predictive model, it’s vital to approach them with caution. Model predictions are derived from patterns in historical data, and while they can provide insights, they inherently come with a degree of uncertainty. Instead of a definitive outcome, we offer a prediction interval-a range within which the actual value is likely to fall. This interval is captured by the length of the green bars: the range within which the URM proportion is expected to lie if legacy considerations were absent. It’s essential to note that these predictions assume that the relationships between variables remain constant in the future. Moreover, this approach doesn’t account for unforeseen future changes. Statistical modeling can be a powerful approach but it’s not a foolproof crystal ball.

The black, vertical dashed line serves as a threshold. For schools to its right, the green bars intersect the black dots, indicating that our model can’t assert with confidence that the actual URM proportion would differ in the absence of legacy considerations. Conversely, for the 10 schools to the left, our model suggests that eliminating legacy could potentially have yielded a URM student body percentage higher than what was actually observed. In ascending order of their actual URM proportions, these schools are Carnegie Mellon, Colgate, Bates, Middlebury, Vassar, Haverford, Notre Dame, Georgetown, Smith, and Princeton.

Conclusion

In the aftermath of the recent Supreme Court ruling and the Department of Education’s investigation into Harvard’s legacy admissions practice, the discourse around college admissions has renewed vigor. While legacy considerations undoubtedly provide certain applicants with an advantage, our analysis delves deeper into how these considerations might impact campus racial demographics. Our data-driven journey shows that overall, legacy admissions do seem to have a negative association with URM enrollment.

We also peeked into a hypothetical world without legacy admissions. For some institutions, the removal of legacy preferences might lead to an uptick in URM student percentages. However, for others, the status quo might remain largely unchanged.

As we unpack the tangled web of college admissions criteria, it’s crucial to reflect on larger societal issues-equity, access, and opportunity. If educational institutions want to foster diverse, vibrant learning environments, then re-evaluating entrenched admissions practices is absolutely a necessary step. But bringing students to campus is just the start of the story. Once a school puts together a diverse student body, it owes it to the students to foster an equitable and inclusive campus environment that values and supports them.

While data can guide our insights, the responsibility to act justly and drive meaningful change lies with educational leaders, policymakers, and society at large.

Your neighbor,

Chad

--

--

Chad M. Topaz
Chad M. Topaz

Written by Chad M. Topaz

Data Scientist | Social Justice Activist | Professor | Speaker | Nonprofit Leader

No responses yet