Algorithms. They’re everywhere. Whether it’s curating your Discover page on Instagram or simply showing you the results of your Google search, we interact with algorithms on a daily basis. Algorithms analyze our every move — every double tap, click, and online purchase determines future ad placements, autocorrect suggestions, or page layouts. Though we voluntarily provide data, many algorithms use information beyond our knowledge. As software becomes prevalent in all industries, data coming in often doesn’t reflect the reality of the world today, resulting in detrimental and far-reaching consequences.
Increasingly, these algorithms have reinforced societal and internalized biases. According to Dr. Ashwin Rao, an adjunct faculty member at Stanford University’s Institute for Computational and Mathematical Engineering and vice president of Artificial Intelligence at Target, the issue lies in the data sets themselves. “It’s more of a convenience issue that has happened — we’ve got some data out there that has been collected in the past, and a lot of people are just tapping into the same data set,” he reports. This lack of conscious choice is feeding into a larger problem: while algorithms are accurate based on the data they’ve been given, the inherent biases within them can have far reaching implications. Joy Buolamwini, founder of the Algorithmic Justice League, conducted a study on facial recognition software developed by leading providers with the goal of demonstrating how our existing biases have infiltrated algorithms. According to the research, lighter-skinned males were misidentified 1 percent of the time. In comparison, darker-skinned females were misgendered 35 percent of the time.
While algorithms are accurate based on the data they’ve been given, the inherent biases within them can have far reaching implications.
While this seems like a small issue, misidentification can have major consequences for law enforcement today. Facial recognition has become an extension of the law in its ability to identify suspects and reach breakthroughs, yet has been known to show inconsistent results based on race. A study revealed that a common data set used for facial recognition software was more than 75 percent male and 80 percent white, yet the U.S. Census Bureau reports the U.S. population is 49.2 percent male and 76.3 percent White as of 2019. These data sets are not representative of the U.S. population and only maintain the risk of misrepresentation with far-reaching consequences. There are also issues with overrepresentation of minorities within certain law enforcement databases. Disproportionate arrest rates within the Black population mean mugshot databases are likely to misrepresent Black people, further perpetuating a biased view of the population through data. Given that the NAACP reports 65 percent of Black adults have felt targeted because of their race, such misleading data sets can reinforce systemic biases and feed into a dangerous cycle for misrepresented communities.
A government-funded study conducted by the U.S. National Institute of Standards and Technology (NIST) showed how using these homogenous and unrepresentative data sets can have a profound impact on society. NIST was looking to study four data sets that were used by the U.S. government, which included mugshots and various immigrant application photos. The agency ran almost 200 facial recognition trials based on these sets using two algorithm methods, “one-to-one” matching and “one-to-many” matching. In “one-to-one” matching, the algorithm matches one photo of an individual to another photo of the same individual within the same data set. In “one-to-many” matching, the algorithm determines whether a given photo of an individual has a match within the database. These two matching tasks are often used to check passports and unlock phones, but also to enforce laws. The study showed that algorithms developed domestically performed poorly when matching Asian, African American, and Native American faces across the board. This wasn’t isolated to a single algorithm, but was rather a consistent result. When performing “one-to-many” matching, systems were most likely to incorrectly indicate a match within the database when subjects were Black women. For “one-to-one” matching, most systems had much higher false positive matching rates for Asian and Black faces over Caucasian faces.
African American women are most likely out of all tested demographic groups to be wrongly accused of a crime.
In the real world, these results mean African American women are most likely out of all tested demographic groups to be wrongly accused of a crime. These false positives pose an overall threat to minority communities, showing how unrepresentative data can create and reinforce systems with inherent biases.
Beyond facial recognition, we see this issue in voice recognition as well. Voice and text recognition uses Natural Language Processing (NLP), which “aims to do a statistical inference for the field of natural language.” NLP enables software to automatically manipulate speech or text, using algorithms to determine and evaluate language rules so language data can be understood by computers. When speaking to an Amazon Alexa, using autocomplete on email, or using spell check on an essay, NLP is working in the back end.
The study found that there were differences when it came to speech recognition between dialects, despite the fact that no dialect is less understandable than the other from a linguistics perspective.
Research published by the North American Chapter of the Association for Computational Linguistics looked to evaluate “the accuracy of Youtube’s automatically-generated captions across two genders and five dialects of English.” The study found that there were differences when it came to speech recognition between dialects, despite the fact that no dialect is less understandable than the other from a linguistics perspective. The NLP algorithms had not gained enough dialect exposure to reduce the Word Error Rate (WER) and increase accuracy for Automatic Speech Recognition (ASR), leading the study to indicate “imbalances in the training dataset” as a reason for the differences between dialects. These imbalances can cause NLP and ASR systems to execute less accurately for minority groups, only maintaining or exacerbating existing inequalities.
How can this problem be fixed? With AI and algorithms being implemented at a breakneck speed, it may seem impossible to regulate the data sets being used. Rao holds developers responsible for ensuring their algorithms’ results are fair and representative, but also notes that software firms understand the impact of these data sets, stating, “There’s a lot of movement right now at companies to aim for diversity. I think companies will probably be at the forefront of creating diverse datasets.” In January of 2019, IBM built a new database of around one million faces with an emphasis on diversity. Microsoft aimed to do the same, but was recently the target of a lawsuit alleging the company profited from adding images to their database without receiving consent from individuals. It can be difficult, complicated, and expensive to create diverse datasets, but reform is necessary to progress in a positive direction. Artificial intelligence is the future of computing, but as a population, we must push for algorithms that promote equitable ideals. As esteemed IBM programmer and instructor George Fuechsel used to say, “Garbage in, garbage out” — a great algorithm is only as good as the dataset used to train it.
Wonderful account of AI and its impact on utilities database.
Congrats to the writer.