Timnit Gebru on Algorithmic Bias & Data Mining Ethics

LDV Capital invests in people building businesses powered by visual technologies. We thrive on collaborating with deep tech teams leveraging computer vision, machine learning, and artificial intelligence to analyze visual data. We are the only venture capital firm with this thesis. We regularly host Vision events – check out when the next one is scheduled.

Our Women Leading Visual Tech series is here to showcase the leading women whose work in visual tech is reshaping business and society. Our first interview in this series is with  Dr. Timnit Gebru. She was interviewed by Abigail Hunter-Syed, a Former Partner at LDV Capital. (Note: After five years with LDV Capital, Abby decided to leave LDV to take a corporate role with fewer responsibilities that will allow her to have more time to focus on her young kids during these crazy times.)

The following is an edited version of their discussion and the unedited video can be found below.

Timnit Gebru won the computer vision competition at our LDV Vision Summit 2017 ©Robert Wright

Timnit Gebru won the computer vision competition at our LDV Vision Summit 2017 ©Robert Wright

Timnit is a pioneer in algorithmic bias and data mining. Her research has uncovered deep inaccuracies of computer vision algorithms when identifying women and people of color. Timnit holds a PhD from Stanford, and is currently the technical co-lead of the ethical artificial intelligence team at Google. Previously, she's held roles with Microsoft Research Center, as well as with Apple. She's also the co-founder of the organization Black in AI, and plays a leading role in the Women in Computer Vision and the Women in Machine learning groups.

Timnit Gebru’s presentation “Predicting Demographics Using 50 Million Images” at LDV Vision Summit

Abby: For the first time, we met in 2017 when you won our LDV Vision Summit competition with your Ph.D. research project, which was based on Google Street View images estimating the US demographic makeup. Afterward, it's been extensively covered by the BBC, the New York Times, and so many others, but I remember when I was looking through your deck at first I thought, "Well yeah, of course, Republicans drive trucks, everybody knows that." But I know that there's a huge jump between common assumptions and empirical truth. What made you decide to dedicate your Ph.D. research to this topic, and why did you think it was so important?

Timnit: At the time when I started my Ph.D., a lot of people had been mining texts to gain insights, but nobody had really mined large-scale, publicly available images and visual data to gain insights. So that's what we wanted to do. We wanted to see if we could do that. We didn't even know if any of this would work. We had a whole bunch of questions: could we recognize all of the cars in the US? Could gain any sort of association with people and the cars they drive from the visual data? Even though it was a simple question, it took almost my entire Ph.D., and my co-author's Ph.D. too. He worked for at least two years on this project, full-time, alongside me.

But the interesting thing is, I started to realize, this approach can be used for many things that can be good or bad. For example, there are places where, if it's very difficult to get survey data, you could try to use visual data to understand poverty rates in certain places, or health outcomes, or how many schools there are in certain locations, and more. 

On the other hand, I saw in some MIT Tech Review article "In 2017, a team of researchers showed this with cars in Google Street View, now a team of researchers at Stanford is associating the home you live in from Google Street View with your likelihood of being in a car accident” and insurance companies could use this. That's actually one of the reasons I started moving into my areas of focus now, which is trying to understand algorithmic bias and about bias in models and datasets; and not just bias in the training data, but ethics in general – what's okay to do, what's okay not to do, the power dynamics of who has data, who doesn't have data, who has access to certain kinds of models, and who doesn't.

Abby: Absolutely. Considering a lot of the questions that we have right now about the US census and how accurate it will be – especially when there are so many immigrants who are afraid of answering - do you think that this could be one of the ways in which it is actually a powerful tool?

Timnit: It's very interesting. I was actually thinking about when I was talking to Dr. Danah Boyd when I was at Microsoft Research. She is one of the most knowledgeable people I know in terms of the census. I think it might be really great if you had a body that is not associated with the government, more of an independent institution that does its own census using publicly available data.

Abby: It sounds like between Dr. Fei-Fei Lee and Dr. Boyd that you've got some very strong female mentors in the space. Outside of the two of them, whether it was at Stanford in your lab or at the FATE lab at Microsoft, how many other women were you working with?

Timnit: Actually, at Microsoft Research I worked with a lot of senior vocal women, and I didn't realize how unusual that was, because a lot of times I wasn't the loud one, or the problematic one, or the opinionated one, or whatever. I remember I was leading some sessions and I said to my boss at the time, Jennifer Chase, "Hey, I know you're my boss, but I'm going to start," and I think she said, "The less you respect authority, the more I like you."

interview-with-Timnit-Gebru

Abby: That’s funny! I saw that you've got two sisters who are both engineers as well. You come from a long line of engineers. Was there a specific point in time when you decided that this was the route that you were going to take, and have continued along that path after that decision?

Timnit: Ever since I was a little kid, when people asked me what I wanted to be, I would say I wanted to be a scientist. I'm not exactly sure why. I think I always loved learning. I was such a nerd. When I was in kindergarten, if they didn't give me homework, I would cry. So they would give me my own special homework.

Abby: Has there been a moment in your career so far that you would consider your most defining event?

Timnit: The most defining event was NeurIPS in 2016 in Barcelona, that was my second time at NeurIPS. 2015 was the first time I went there and I didn't have a good experience either. I felt very isolated. I felt like an outsider back then but in 2016 it was much larger. There were around 5,500 people but barely any women, Africans, African-Americans or anybody who's black. It's not about just color or race, it's also culture.

I came back with a feeling of panic and wrote a Facebook post, where I described the situation and concluded with, "When you have drones, who's going to be the person who's considered a terrorist versus not.”

People talk about diversity, but they talk about it like it's a charity or something, and they pay it lip service. AI is a system, and the people who are creating it are part of a system, and if they are not considered part of a system we're going to create technology that harms a lot of people.

Abby: If you could sum it up, what is the biggest meaning for society that comes out of your research “Gender Shades” that was done in collaboration with Joy Buolamwini?

Timnit: That research opened people's eyes into AI. A, how much AI was being used in everyday products that you're paying for, and B, how much disparity there is across groups in terms of what works best for which group. So it made people question their data, how we're using these kinds of systems in high stake scenarios.

A lot of people are focused on the diversification of datasets, and not enough on questioning the task itself, and also the way to gather datasets.

Ruha

Ruha Benjamin’s “Race After Technology” is such a great book on this topic! She uses the term “predatory inclusion”, which means you're trying to "include" marginalized communities for your own benefit, but not for their benefit. You don't include them in the whole process from the very beginning. You include them because you need to diversify your dataset, and you might do it in a way that's actually predatory and not ethical.

Abby: Do you think that this is one of the ways in which we can correct AI bias now in order to make sure that the structural biases of our society today are left out of these algorithms that are going to define our tomorrow?

Timnit: There are so many things that need to happen. I think that one is understanding the lay of the land, where all of this stuff is being used, where the data is coming from, who owns the data, who doesn't own the data too, is important. 

For Gender Shades, we use the task of automatic gender recognition as a testbed to show some of these issues, but actually, a lot of people have written about the fact that automatic gender recognition tools should actually not exist in the first place. That's already a conversation on its own. It opens up such a complex discussion.

I wrote a paper with a history student in which we talk about data collection and some parallels that we can draw from archival histories, how they were collected, the issues faced, and the things that they were trying to do to address them. Some of these examples are things that people in the machine-learning community are talking about right now. 

One example is data consortiums. For example,Google has a lot of data on a lot of people, and you don't have that much data -so there's a power imbalance there. Small startups can't have a lot of data. Government bodies might have data, but nonprofits don't. So how can you pull resources to have data consortiums? Maybe it could be for the public good or just for a bunch of different smaller businesses. 

Another idea is disclosure and documentation of what is in your data and what is in your dataset. How did you gather it? What are the characteristics?

Abby: If you think about medical papers, for example, they have to be very distinct in who was their test set, and what they're working on, so why wouldn't you apply those same types of practices to machine learning, right?

Timnit: Exactly.

Timnit Gebru at LDV Vision Summit 2017 © Robert Wright

Timnit Gebru at LDV Vision Summit 2017 © Robert Wright

Abby: We talk a lot about synthetic data, and how we think that synthetic data is a way to bridge that moat that these big tech companies have built up from a data perspective. I also personally think that there's a lot of opportunity within synthetic data to help make sure that it is unbiased in nature, because you're essentially creating a lot of it. Is that something that you've looked at?

Timnit: I think about synthetic data, data augmentation and practices. Some of my ideas go way, way back. Emily Denton, who's very well known in working on generative models, and who's also on our team now at Google, and a bunch of us, wrote a paper on using generated images and augmenting them to test for bias. There are huge issues with synthetic datasets right now that we're dealing with. I do think that it can be used in certain specific scenarios but I would be very careful to not think of it as the solution to a problem.

Abby: Is there one application of computer vision or deep learning at the moment that gets you really giddy about its potential?

Timnit: I'm very interested in low-resource settings. Ernest Mwebaze, a researcher in Accra, has been working with other people on crop disease identification and gathering data that can help small-scale farmers. I know some of my colleagues who would want to track bird migration patterns because this helps you understand climate change and what's going on. 

To be honest, some of the use cases I've been seeing of vision are very concerning to me. I hear of the electronic border wall, or anything to do with face recognition is very worrisome to me right now because of the manner in which it's used. So when I see these kinds of things that are placing power in the hands of just people, that's what makes me happy.

Abby: If you were going to give one sentence of advice to yourself as a 12 year old girl getting ready to embark into this world of deep tech, what would it be?

Timnit:  My advice would be to keep that spirit I had when I was 12. When I was that age, I did not care what other people thought. I remember I came to an audition for jazz players at a very famous jazz school in Berkeley that I really wanted to go to. I auditioned, even though I didn't know anything about jazz. It was only classical music that I learned. The audition went terribly but I took it easily and moved on. Now, I think I would care a lot more. 

So just to keep that spirit of pursuing what you enjoy regardless of fear of failure or what others think.


Watch the full version of this interview below:

We can’t wait to introduce our next guest in the Women Leading Visual Tech series! Stay tuned and be well.