The average person is so accustomed to seeing the results of artificial intelligence algorithms—like Amazon purchasing recommendations and Facebook ads—that it is easy to overlook the potential harm of these systems. In this episode, APU professor Cynthia Gentile talks to professor and author Dr. Robin Zebrowski about how inherent bias is built into artificial intelligence algorithms. Learn about the complexities of trying to correct or retrain these massive data sets, the discriminatory dangers of facial recognition technology, the invasive nature of speech recognition software, and why the government must heavily regulate Big Tech to protect how data is used.
Listen to the Episode:
Read the Transcript:
Cynthia Gentile: Welcome to the podcast. I’m your host, Cynthia Gentile and I’m excited to introduce my guest today, Dr. Robin L. Zebrowski and to dig into the complex and confusing study of bias in artificial intelligence systems and algorithms.
Dr. Zebrowski is an Associate Professor and Chair of the program in Cognitive Science with joint appointments in philosophy, psychology and computer science at Beloit College in Wisconsin. Her work is primarily on questions of embodiment and artificial intelligence, including how humanoid forms contribute to concept formation and how that relates to AI design and implementation.
She also works on theoretical systems in human-robot interactions and human-computer interfaces. Her new book with Eric Dietrich, Chris Fields, John Sullins and Bram van Heuveln called “The Great Philosophical Objections to Artificial Intelligence: The History and Legacy of the AI Wars” is out now from Bloomsbury. Robin, thank you for joining me today.
Dr. Robin Zebrowski: Thanks for having me.
Cynthia Gentile: I’ve been so looking forward to this conversation all week. I am intrigued and really, to be honest, more than a little overwhelmed by the concept of inherent bias in the algorithms we interact with either knowingly or unknowingly. I know it’s a tall order, but can you give us a 1,000-foot view of how algorithms affect our lives, from the information we see online to the medical care we receive?
Start a Business degree at American Public University.
Dr. Robin Zebrowski: Yeah. I mean the big picture is, it’s everywhere. Lots of things you might not realize are using algorithms. And lots of it is also designed to make you think that they’re helpful, right? Like your Amazon recommendations or Google search, you don’t even notice that you get different results than other people.
So a lot of it is sold to you as just the way the product works and you might not realize that it’s learning your preferences and things like that. But it’s deeper than Google and Amazon, it’s facial recognition, it’s your healthcare. And some of those are more damaging than you might think.
Cynthia Gentile: Right. And this is all so unregulated and unchecked. So what are the primary risks as you see them of this unchecked nature of AI systems?
Dr. Robin Zebrowski: For the most part, the people who tend to already be marginalized in a given society, you’re going to see that amplified. They’re going to be further marginalized. These algorithms tend to sort of reinscribe the biases that we already see in a society.
There’s actually a really great argument that the scholar, John Flowers made, I saw him give a paper and he argued that these algorithms, they’re actually working as intended and they’re not failures. They’re successes in showing exactly how biased various societies are and in what ways.
The risks are really outrageous. So these things are deciding who gets longer sentences in jail. They’re deciding who gets housing. They’re deciding what your search terms give you when you search YouTube. And we have a direct line back to several like mass shootings that were as a result of information fed up from Google, from like white supremacist websites and stuff.
Cynthia Gentile: Right. And we’re all hearing more and more about that every day on the news about how we get into these rabbit holes of information. But I think my question and maybe it’s because you have to explain it to me like I’m five—I think my question is even more basic. How does the bias get built into the algorithm?
Dr. Robin Zebrowski: So often it’s the training set that an algorithm is given. So the way the algorithms tend to work is you feed them a training set and they learn from that training set. And there’s usually a teacher program, like another program that tells it when it’s right and wrong.
And so it starts to get smarter and smarter because it’s corrected when it’s wrong. But in order to do that training, you give it a bunch of data and then tell it what the answers are. And that data, no matter what it is, tends to be biased in ways that people hadn’t really predicted. Although I shouldn’t say that because sociologists and philosophers were talking about the ways this would happen way before the algorithms were actually rolled out. So they kind of were predicted, just nobody listened.
Cynthia Gentile: So is it that the data that is being used is already bias and reflective of the biases of the individuals culling the data?
Dr. Robin Zebrowski: Generally, yes. So for some things like there’s a new, fairly new, natural language processing algorithm that came out recently from Open AI called GPT-3. And it’s making big waves because it seems to be extremely good at producing apparently coherent natural language. Like it could write essays for our classes and they would be pretty passable.
And it’s coming out and in fact it was known and still no one acted on it, but that there’s a whole bunch of anti-Muslim bias built into the data set that GPT-3 was trained on. And GPT-3 was trained on something like the equivalent of, I can’t remember the exact number, it was like 1 to 3 million books, that no human could read in their lifetime. And that means that nobody can actually check the entire data set. So no one actually knows all of the things that GPT-3 has been trained on.
Cynthia Gentile: So, it’s acting sort of on its own then. If it’s a collection of information that no one person can check.
Dr. Robin Zebrowski: Yes. I mean, and that’s what gives it such powerful results, but we also like even the programmers, when they tested it, they could see that there were some not okay results and they released the product anyway.
Cynthia Gentile: Okay. So what steps can be taken at that data set input point to shift the information to be less biased?
Dr. Robin Zebrowski: That’s a good question because not everyone realizes how the data set is biased until we see these outcomes. There was a famous case a few years ago where Amazon tried to automate its resume, like the first round of resumes of what looks at them.
Because they realized that when humans were doing it, they kept picking the same exact kind of people over and over again to get hired at Amazon. And they were like, “Okay, we need to diversify. Let’s feed it all of the data that we have of who we’ve hired, and then let it figure out what it is that we’re looking for and we’ll let the algorithm do the work.”
And, of course, it accidentally learned that if it looked at the resumes of all of the people that have worked at Amazon, that they were all men. And so it baked right into the algorithm, it sort of scored lower if you saw the word, a woman’s group, or a women’s college or anything that alluded to being a woman was ranked lower simply because of the history of who Amazon had hired.
So they were trying to do a good thing by taking bias out of the human screeners. And instead they made it worse and they fiddled with the algorithm once they knew this was happening and they still couldn’t fix it, they couldn’t figure out what was picking up exactly on these differences. And so they had to scrap it in like 2017, I think.
Cynthia Gentile: Wow. So the intention was that the algorithm should make it more of a arms-length transaction, less influenced by people’s own personal biases, but really it just amplified a whole bunch of people’s biases.
Dr. Robin Zebrowski: Yup. Things they didn’t realize they had been doing.
Cynthia Gentile: So here we are in a reality where, for the most part, we ask big tech to sort of self-regulate in all of these efforts. And I can sense, I know what your answer to this might be, but can Big Tech regulate this?
Dr. Robin Zebrowski: No. God, no, no. Not even a little bit. The few places that have made any attempt to self-regulate have done a terrible, terrible job. The firing of Timnit Gebru from Google has been all over the news. She was their AI ethics lead and she was amazing. She is amazing at what she does. And she was fired from Google because she was about to release an extremely well-sourced, well-documented paper that showed that Google was doing something that they shouldn’t be doing. And that their product was biased in a number of different ways. And they didn’t like it. So they fired her.
That was literally her job, by the way. She was the one who was supposed to regulate this and they fired her instead.
Cynthia Gentile: So she was brought in to uncover places where they were failing, uncovered such places, and then…Was the report ever released?
Dr. Robin Zebrowski: So, yeah. You can find it now all of the usual tech news sites have copies of, I think of the paper. I haven’t read the entire paper that she wrote. And it was not just her, by the way. It was this giant swath of co-authors, people in academia, other people who are working on tech regulation and stuff. It was an amazing report.
I read like most of it and then I said, “Oh my God, I can’t believe Google fired her for this.” Because it wasn’t wildly different than other things she had said in the past. But, yeah, you can find it, like I think probably The Verge or something would have copies.
Cynthia Gentile: That’s so interesting. And now I will add that to the next rabbit hole that I will go down. So my next question is really, we look at things like we’ve mentioned resume tracking systems, and or resume review systems. And then you mentioned your Amazon searches and the predictive nature of what Amazon might show you or a sponsored ad on Facebook or things like that.
So looking at sort of those things that I think most people perceive to be fairly innocuous. Can you explain the risk there? How we are being kind of culled into specific caves where we are only seeing certain information?
Dr. Robin Zebrowski: Yeah. So, there was a really great deep dive done on YouTube version of this. I think it was published in The New York Times, Zeynep Tufekci, did a big piece on “YouTube, The Great Radicalizer.” She talked about ways that the researchers would start at something fairly innocuous and if you let YouTube recommend the next video to watch, they keep recommending more and more and more radical videos.
I think they did some with vaccines. They started looking at something for vaccine information and they were a few steps away, they got anti-vaxxer propaganda, and then not far after that, they were getting like right wing, white supremacist propaganda. And some of this is because YouTube algorithm is designed for engagement, not truth. It’s designed to get you to stay on YouTube and to keep clicking because that’s how they get money.
So the more radical the thing that they show you, the more likely you are to watch it and be like, “Oh my goodness, is this true?” The way it works for normal people, they just end up on this rabbit hole on YouTube and it’s really damaging.
I’m laughing that you gave the Facebook ad example. I hate that I’m not off of Facebook, but it’s well accepted and understood that Facebook is honestly responsible for genocide in Myanmar. A bunch of military officials used fake names and fake identities and posted lots and lots of propaganda about a minority group. And it led to a well-recognized genocide of that group, and Facebook finally had to like take down all of these accounts that weren’t real people.
So Facebook is way less innocuous than annoying ads that follow you around the internet. They’re well understood to be actually damaging. And, of course, we also know about the difference between what Democrats and Republicans saw during the 2016 election. And again, leading up to the recent election.
Cynthia Gentile: So thinking about the things that myself or others may find to be innocuous, they’re not so innocuous. They’re not really so simple. But I want to explore a little bit about the concept of the ways the algorithm impacts sentencing, and who’s released on probation and who isn’t. As well as medical care and the ways in which biases creep into medical care.
Dr. Robin Zebrowski: So with the sentencing stuff, this is one of the earliest cases I heard about. So it must have been at least four or five years ago, maybe even longer than that. It’s now been well well-documented. So first of all, the algorithm is proprietary. So the company that sold the algorithm to courthouses around the country, doesn’t let anyone look at the code. So nobody knows what’s baked into the training set or what kinds of biases might be baked into the code itself.
But you’d get two people who are in for the exact same crime and have the exact same background, et cetera. And one is white and one is black. And the white one will be let off on probation, and the black one will be sentenced to a fairly harsh sentence just because the algorithm says he’s more likely to re-offend.
Cynthia Gentile: So obviously the data that’s input is just furthering the inequality that we see in that system. Is that using a predictive analysis?
Dr. Robin Zebrowski: Yeah. So it’s taking data from the past. And again, it’s not these two individuals’ data, it’s data about some demographic about them. And then it’s trying to predict what will happen in the future, which is why you see these people getting harsher sentences, because the algorithm says they’re likely to re-offend.
And of course, when you actually look at the data of these people, what happens later is that quite often it’s the opposite and the person who was let out on probation re-offend and the other person comes out and does not.
Cynthia Gentile: So I ask the same question. Are there ways to, from your position as an ethicist, are there ways to kind of remove this bias from the predictive nature of the analysis? Or is it inherent in the system as it stands to look at data that is pulled from multiple people?
Dr. Robin Zebrowski: Yeah. I mean, as far as I’m concerned and there, I don’t know that this is the accepted sort of view of everyone that does this work, but most of these systems are inherently biased and it’s almost impossible to make a system like this that’s not inherently biased.
And, again, that’s partly because our societies are biased in these ways. And so there’s no way for us to, in advance, really weed out those biases because we live in the place that has them there. It’s the air around us. Even if we’re aware of some of them, we’re never going to be aware of all of them.
Cynthia Gentile: And when you add up everyone into one pot.
Dr. Robin Zebrowski: Right. The massive size of these data sets is part of the problem.
Cynthia Gentile: I’m really enjoying this conversation, Robin. So how about facial recognition, and the software used for facial recognition? How does it work in terms of amplifying patterns of discrimination and bias?
Dr. Robin Zebrowski: As far as facial recognition is concerned, it’s a mess, honestly. It’s a big old mess. We know that Amazon wrote a program called Rekognition, Rekognition with a K, and sold it to a bunch of police departments around the country. And they, and the police departments, all refused to say who was using it.
And I think one police department, I think somebody admitted that they were using it and they weren’t supposed to, which is how we know some of the things that we know. Joy Buolamwini, is the one who did that work, I think. She discovered that the facial recognition software was like, I forget the exact percentages, but it was somewhere like 97% or 96% accurate for white men and for different demographic groups, it dropped radically.
She was at MIT and she was supposed to like log into something using a camera, and it didn’t see her face because she’s a Black woman. And she like held up a fake white mask in front of her face and suddenly the facial recognition algorithm kicked in and started trying to figure out who she was. And so she did a big research project showing how recognition was only really accurate for white men. And then it got less and less accurate. And Black women were like, it was in the 40% range.
One of the immediate problems you can see with this is, facial recognition technology is deployed on communities of color and minorities more often because they’re over-policed and things like that. And so Rekognition would trigger matches for a tremendous number of like Black men, of course, who, you know, “fit the description.” Because they’re Black men.
And the algorithm was extraordinarily bad at identifying Black men. They knew that, they knew that the percentage was low and so lots and lots and lots of people get pulled in and put in lineups and things. And they’re not involved in any sort of crime whatsoever, but Rekognition is mis-triggering them. And there was actually one man who I think was arrested and like served some time based on one of these facial recognition algorithms gone awry.
It’s pretty bad, and I’ve been pretty swayed by some of the other ethicists. I think Evan Selinger is one of the people who’ve made this argument really convincingly, that there is no acceptable use case of facial recognition tech, and it should just be outlawed. Like there should be regulation that disallows any kind of facial recognition tech.
Cynthia Gentile: Interesting because my example, my, I have twin 13-year-old daughters who use iPhones. They are fraternal. They do not look alike per se. You certainly would be able to tell them apart. However, they are both able to unlock each other’s iPhones.
Without having done anything to store that data, their face shapes and bone structure, I presume are similar enough that the phone will unlock for either of them. They find it hilarious, I find it disturbing. And so obviously extrapolating that out to using this for any type of criminal data set is, is kind of a fool’s errand. It would seem, you can fool it with a 13-year-old’s face.
Dr. Robin Zebrowski: Right. It’s not good, and it’s used in a lot of places already. I mean, there were a bunch of news reports about China deploying this kind of facial recognition technology at train stations and bus stops, and just hauling people in for all kinds of infractions. Like people that were wanted for whatever reason.
But also we know about China’s social credit system and the radical inequalities there as well. And there’s evidence from human rights groups that China is also engaging in an ethnic minority kind of cleansing. And the facial recognition technology has helped them do that.
Cynthia Gentile: Wow. So shifting off of the facial recognition piece, but because we were talking about Amazon, my curiosity was piqued to think about the virtual assistants like Alexa, Siri, Google Home, all of those that are voice triggered. Are there racial biases in those systems? And if so, can you explain how that happens and how it manifests?
Dr. Robin Zebrowski: Yes. So I don’t use those systems for obvious reasons. It’s funny that I’ve worked in AI for like 20-something years and I would never have one of those devices in my house for lots of different reasons. But as far as their biases and things, they learn your speech and things like that. And there’s the evidence that they’re kind of always listening because they tell you that they’re always listening. It has to always be on to hear the trigger word of like, “Hey, Siri or Alexa” or whatever.
Again, I haven’t seen any studies. I would be very unsurprised if they existed, but I haven’t seen any, but you can imagine pretty easily that it’s probably way better with speech recognition, with sort of standard American English, rather than like African-American vernacular English or something like that. Like, it probably would just fail to decipher the sounds as well. But again, I haven’t, I haven’t seen data on that particular one. I think that, that Siri and Alexa and all those are bad for other reasons, even before we get to that part.
Cynthia Gentile: So that’s more around the actively surveilling sort of aspect of it?
Dr. Robin Zebrowski: Well, that’s definitely, I mean, that’s part of it. It was widely reported by like The Washington Post and all of the big networks were reporting that someone’s Alexa, whatever the Amazon Echo was recording text of a family of a man and his wife, I think, talking in their kitchen. And then it saved the audio of that conversation and emailed it to someone in his contact list. Horrifying. And I think it was like his boss or something.
Cynthia Gentile: Yes. It was certainly an employment-related issue. And I recall it, I don’t remember the, the timeframe, but I do know that that happened.
Dr. Robin Zebrowski: Just a horror show. I can’t even imagine. But like, again, it doesn’t take a huge imagination to see other ways that that is a bad thing to have in your house, right? At the very least, you have to assume Amazon is data mining some of it, no matter how much they claim they’re not, they almost certainly are. They data mine everything.
So some of it is a surveillance capitalism question, like they want to gather that data so that they can sell you stuff. There’s really good science fiction that plays on this as well. But like, it’s problematic in just so many ways like this. And I don’t know, I have to assume this was probably true at some time for Alexa, but it was definitely true of Amazon’s Ring cameras. You know the Ring surveillance system, right? You put them on your doorbell, you can also have them inside your house.
It turned out that Amazon had hired Ukrainian contractors to train the AI. So they claim that the reason they’re good at telling if it’s a car driving by in front of your house, if it’s a person walking up to your door, is that they’ve got this really great AI that can recognize these different objects. It’s really bad AI.
What they had was Ukrainian contractors, looking at all this camera footage and telling like explicitly saying to the algorithm, “This is a face, this is a car. This is that.” And so they just gave them unfettered access to people’s cameras inside, as well as outside their homes, so that they could train those algorithms.
Cynthia Gentile: So the training was occurring with live data?
Dr. Robin Zebrowski: Yup. Yup.
Cynthia Gentile: So as evidenced by my own surprise, my next question is around our own consumer, individual agency in allowing these products into our homes. Do you think there is a way to affirmatively get consent from a consumer? Or are people just unaware of what the data collection looks like?
Dr. Robin Zebrowski: So I’ve thought this about medical ethics for a long time. That informed consent in that context is pretty much impossible. I was recently on another podcast, Rose Eveleth’s Flash-Forward podcast. And she asked me the same question about data. And she very quickly convinced me that informed consent in that context is also not possible.
And so for all I know, the company, the software company that’s recording us right now, is going to use our conversation to data mine, to train it’s speech recognition stuff, et cetera, that it will then sell on in all of the dangerous and problematic ways that we’ve been talking about. So like, no, I don’t think it’s possible that most people are going to have informed consent about this stuff. And even if there was any chance of that happening, most of us can’t function right now without some of these technologies, like we all ended up in our basements on Zoom when the pandemic hit.
Cynthia Gentile: And that’s exactly my next question. So in the current world, in this pandemic-fueled weird reality that we are living in, where we’ve shifted to virtual offices, virtual learn from home, work from home. How has that impacted reliance on not just technology per se, but on these AI systems and the evil algorithms specifically?
Dr. Robin Zebrowski: It’s bad. I mean, when this all happened and we knew we were going to be teaching virtually, and I normally I’m at a liberal arts college, we’re 20 people in a classroom sitting in a circle chatting, and I had to somehow figure out how to do this online.
We all knew that Zoom was going to be our best bet. And I very quickly did a search around on their privacy practices. It was a privacy nightmare. Agreeing to use Zoom at all, meant that the Zoom employees whoever, their quality control, et cetera, people were, they could just be watching your Zoom without letting you know anyone was observing. If you save your recordings on their servers, they have full access to all of that. And it was known that they were using all of these things. It was just awful for privacy.
And I was left with no choice, really, but to do it. And so, again, as informed as I am about this stuff, I was there on Zoom and I’m still there on Zoom, right? We’re a year later and we’re still doing it. And they improved, I think some of their privacy practices, but not all of them.
And that’s again, before we even get into all of the other questions. So there’s a way of thinking about some of the inequalities. I learned the language from Virginia Eubanks, who wrote a great book called “Automating Inequality.” And she calls it digitally redlining.
I think about the fact that when we went online, my 6 year old — we had no problem. We already had broadband, I could be on Zoom, my partner could be in his meetings, and my six year old could be also on Zoom doing his schoolwork. But lots and lots and lots of people don’t have access to broadband because companies don’t think it’s profitable to put broadband in poor neighborhoods. So there’s this other way that all of these groups get sort of locked out of access to these technologies before we even talk about the algorithms.
And I was just talking to a friend recently, whose kid is taking the SAT from home. I think that was the deal. And I had to introduce them to the concept of Proctorio the software that is basically facial recognition software, lots of universities are using it for test taking. Thank God, my college hasn’t and would never suggest it, but a ton of universities are saying, “Look, if kids are going to be taking tests at home, they have to use this monitoring software.”
And the software, not only is it basically staring at them the entire time, but it is also racially biased. It doesn’t pick up Black faces very well. There was a story of one young girl who was taking some tests or other, who had to have a spotlight shining like an inch away from her face, otherwise the test kept telling her she was cheating because it seemed like she was looking away just because it was not trained well enough to pick up data on her face. Yeah. Horrible nightmare. Like the schools that are using some of these, I have lots of questions for them.
Cynthia Gentile: So there are so many obvious problems. There are problems that are unavoidable, even for someone as well-trained and intentioned as yourself. Are there people advocating for better privacy policies, for better-informed consumer choice? Who are some people that we can look to maybe as a place to help advocate for that?
Dr. Robin Zebrowski: Yeah. So there are, again, lots of people, lots of researchers who are as like individuals and in groups doing just amazing work on this kind of stuff. You know, I would said Timnit Gebru, who was doing amazing work to try and keep Google in line until they fired her. And again, Joy Buolamwini, is also doing great work and she founded a group called the Algorithmic Justice League, which is amazing.
And there’s a group called see, I want to say [All Tech Is Human], I think is the name of it. [David] Ryan Polger’s group, I think. There are independent academic groups. There are academic researchers. Lots and lots of people in groups trying at every level pretty much to do something about these known problems.
Trying to either get tech companies to regulate, which they’re not doing. Trying to convince lawmakers to regulate, which they’re not doing. And also just publishing academic work to prove that these things are doing these things over and over again, hoping that someone is going to be able to take action.
Cynthia Gentile: What do you see as the role of government or lawmakers in this?
Dr. Robin Zebrowski: I really, really believe that most of this tech needs to be heavily regulated. Again, I’m against facial recognition in almost any circumstance, but most of the technology needs to be regulated. It needs to be regulation with teeth, too. You can’t fine Google $500,000, a million dollar. You can’t fine Google a million dollars when they are found to be in violation of this stuff.
ProPublica did a huge expose where they showed how Facebook was making it very easy for landlords to advertise their housing only to certain kinds of people and to be racist. And when this was discovered Facebook was like, “Oh yes, we’ll fix that.” And they went in and fiddled with something a tiny bit and ProPublica showed how they were able to go in again and again and keep doing it.
So even when there’s known violations of things like The Fair Housing Act, it doesn’t matter. These companies are too big, they can’t be easily regulated. There’s arguments that things like Facebook and Twitter should be broken up and regulated as like public services. And there’s an interesting argument for that, I think.
Cynthia Gentile: So do you see the biggest impediment to that regulation the size of the companies the too-big-to-fail aspects of the companies? Or is it the complicated nature of understanding the problem?
Dr. Robin Zebrowski: I think it’s probably a good part of the size of them. Again, the fact that Facebook could be responsible for genocide and Twitter could be, I mean, why did we have the last four years of the leadership we had? Twitter. And so like, there’s that. The fact that they’re so big and so powerful.
But also they’ve got so much money and there’s lobbyists and it’s not so much that it’s hard to understand. I think that we’ve just been talking for what? A half hour. I think that all of your listeners are going to understand a lot of this. It’s not that hard to understand. I think it’s just that there’s compelling reasons not to regulate.
And it’s the same with big businesses all the time, right? There’s lobbyists and there’s reasons that the government doesn’t want to step in and try and say, “Facebook you’re too big. We’ve got to break you up as like a utility.”
Cynthia Gentile: Right, and there are certainly legal arguments to meet on both sides of that equation, which could be another podcast series. Robin, I had such a nice conversation nice time talking with you today. This has just been so interesting. And I really do feel like we barely scratched the surface. So thank you. Thank you.
Dr. Robin Zebrowski: Thank you so much for having me.
Cynthia Gentile: And thank you to our listeners for joining us. Be well and be safe.