Can Big Data Be Racist?

The Bold Italic
The Bold Italic
Published in
5 min readMar 31, 2014

--

By Cecilia Esther Rabess

Illustration by Brian Standeford

Big Data powers the predictive models that can tell Target you’re pregnant before your physician even knows. It enables specific segmentations of huge datasets, like the algorithm that created the 76,897 micro-genres keeping us hooked on Netflix. And it has given us the ability to crowdsource real-time insights, like the groundbreaking revelation that Americans are equally keen to #deportbieber as Canadians are for us to #keepbieber.

In other words, Big Data is currently the best method we have for making sense of an increasingly complex world. But it’s also imperfect, because all this data and the decisions we make on the basis of it are never completely objective. Consider the case of St. George’s Hospital Medical School, which built a Big Data algorithm to automate its admissions process. The idea was to reduce variability and increase objectivity, but instead the school inadvertently institutionalized bias against women and minorities. This happened because it was relying on historical admissions data that unduly favored white male candidates. The model did exactly what it was supposed to do — and the opposite of what was intended.

When we translate cultural clichés and stereotypes into empirically verifiable datasets, we introduce subjectivity into a discipline that strives for objectivity. When we imbue our Big Data insights with our race-based biases, we project our prejudices onto subsequent observations. It’s inevitable. So is Big Data racist? The answer is complicated.

Imagine now a Big Data algorithm that systematically denies credit to people of color on the basis of their Facebook likes, a practice that several credit rating agencies are beginning to embrace.

Take, for example, Harvard professor Latanya Sweeney’s discovery that searches for racially associated names were disproportionately triggering targeted ads for criminal background checks and arrest records. The algorithm was both exposing racial bias (the offensive ads were more likely to reappear if people continued to click on them) and exacerbating it (the more people saw ads that suggested black names were connected to criminal activity, the more existing racial prejudice was reinforced).

Imagine now a Big Data algorithm that systematically denies credit to people of color on the basis of their Facebook likes, a practice that several credit rating agencies are beginning to embrace. The predictive model suggesting that minority populations are a higher credit risk could just be a reflection of the bias in our society, as was the case with St. George’s and with Sweeney’s study. But as marketers continue looking for a shorthand with which to identify population segments, their curiosity about black customers is being used for purposes as benign as selling basketballs and Justin Timberlake CDs, or as nefarious as refusing access to credit or basic civil rights and services.

Companies now have access to huge quantities of unvolunteered demographic data that could be put to such use. One widely reported example is that of Atlanta resident Kevin Johnson, who, in spite of his near-perfect FICO score, had his credit limit reduced from $10,800 to $3,800 without warning when American Express determined that “other customers who have used their card at establishments where you recently shopped have a poor repayment history with American Express.”

As a black woman, I’m sick of seeing ads for payday loans and hair relaxer (though I will never tire of Justin Timberlake ads).

While AmEx and other companies like it don’t claim to hold racial bias, their data does. OkCupid knows your race on the basis of whether you like bonfires, the Red Sox, Tom Clancy (white) or the Bible, PlayStation and Law & Order (black.) So does Facebook. And most likely so does every other company on the planet that’s jumped on the Big Data bandwagon.

This is why some businesses have come under fire for price discrimination — as when Staples.com actually showed higher prices on its websites to Internet users from poorer areas. It’s why, as websites increasingly provide personally tailored content, the Internet can literally look different depending on the color of your skin. And it’s why, as a black woman, I’m sick of seeing ads for payday loans and hair relaxer (though I will never tire of Justin Timberlake ads).

Big Data gives us the power and the platform to reduce people to their quantified selves in a way that hasn’t been possible until now. It gives us the ability to further the powers of discrimination. I’m sure Oprah would be thrilled to know that she doesn’t even have to leave her home in order to be racially profiled.

During Hurricane Sandy, aid workers were able to leverage real-time tweets to deploy resources to areas with the highest volume of distressed Twitter users. Missing from the equation, however, were those most affected—those who couldn’t tweet because they didn’t have mobile Internet access or power, or either.

But racial profiling isn’t the only issue. What about those people whom the Big Data machine fails to even account for? Data subordination, a concept introduced by researcher Jonas Lerman, describes the inability of Big Data to fully capture those whose data footprint is small or nonexistent. These are people without smartphones or Facebook accounts or credit cards, who are disproportionately black. People who are already marginalized by society will continue to be marginalized by the models that ignore them. During Hurricane Sandy, for example, aid workers were able to leverage real-time tweets to deploy resources to areas with the highest volume of distressed Twitter users. Missing from the equation, however, were those most affected—those who couldn’t tweet because they didn’t have mobile Internet access or power, or either.

In the end, Big Data not only reflects the racism in our society but also perpetuates it. So while it’s not inherently racist, Big Data makes it that much easier for the people using it to be.

The numbers give us an excuse to stop asking questions and permission to reinforce stereotypes. When we reduce someone to the sum of their statistical and probabilistic self, we turn them into a cultural caricature that may reflect the shortcomings of our society better than any real values, behaviors or preferences.

The irony is that Big Data is a big step forward, a way to use today’s powerful technology to understand phenomena that we would have completely missed yesterday. But instead we may be doing just the opposite: using these powerful predictive models to perpetuate racial bias.

--

--

We’re the The Bold Italic, an online magazine celebrating the spirit of San Francisco. Brought to you by GrowSF.