Google wages war against biased data
Google is creating more ‘representative’ algorithms to mitigate against outdated gender, racial and other stereotypes
Internet giant Google is waging a war against the inherent biases, most notably along gender and racial lines, that plague its vast sets of data.
As one of the biggest and most powerful companies in the world — parent company Alphabet is worth about R10-trillion — Google’s decisions about how to sort and present its deep pools of data can have far-reaching consequences for society.
The group recently listed "Avoid creating or reinforcing unfair bias" as one of its seven "principles" for artificial intelligence (AI), otherwise known as machine learning — a process of feeding data into a system so it can "learn" and identify patterns.
The problem is, says Jen Gennai, Google’s head of ethical machine learning, trust and safety, "data is biased", particularly when it comes to gender and race.
"Using the gender example, most of our texts have been written by men and reflect the lives of men," says Gennai, whose team is tasked with creating more "representative" data sets to move away from outdated stereotypes.
A simple Google image search for engineers, for instance, yields mostly pictures of men, while nurses and teachers are generally portrayed as women.
"That’s what we have recorded in our history, so that’s what is reflected," she says.
Racial stereotypes are also in Google’s crosshairs, particularly after the company found its systems were assigning certain job types according to race and ethnicity. Until as recently as last year, a Google search for "CEO" brought up a set of pictures almost entirely comprising white men. One of the few exceptions was an image of a Barbie doll — practically the only female to make the list.
"That’s obviously not representative of the world, so we had to ask ourselves what fairness actually means in this situation.
"But it doesn’t mean gender parity — putting up 50% women and 50% men — because that’s not what the real world looks like."
In this case, fairness, or a true representation of the world today, is probably a search result reflecting that 13% of CEOs are women, Gennai says.
But it’s not only search results that promote historical biases, she says. New products, such as Google’s "smart reply" function, which suggests quick responses for e-mails, also have to be rejigged to shed their prejudices.
Before that product was launched, Gennai’s division was roped in to adjust the algorithms running in the background and to blacklist certain "bad results".
To align them with modern values, the company is tweaking old data sets and sometimes creating entirely new ones.
For example, Google recently asked users around the world to submit images on certain subjects.
Searches for weddings, for instance, still tend to show white women in white wedding dresses. Using this example, the campaign is aimed at sourcing new images of Indian women in colourful dresses or Chinese women in red dresses, says Gennai.
"So we’re getting a much more diverse data set by asking the world what it looks like now, not what it used to be."
Gennai’s team is also working on more futuristic products, including AI-guided self-driving cars.
The number of accidents involving these vehicles has raised questions about their safety. Uber made headlines in March when one of its vehicles, which was operating in autonomous mode, killed a pedestrian.
"Safety is definitely a problem, but it’s not a reason to not do it," says Gennai.
"We know it will save lives, so it’s more about providing a narrative so people can understand that mistakes will be made — this will not be a 100% solution, but it will have less of a negative impact than human drivers."
Citing irrational fears about air travel, Gennai says the public will not be swayed by data that shows autonomous cars are safer than their manual counterparts.
"We’ll have the data that will prove that self-driving cars are safer than normal cars, but it’s not going to work, so it’s got to be about the narrative."
However, she acknowledges that errors will provoke severe backlashes. "The minute it makes a mistake, it will be the most serious mistake a car has ever made."
The autonomous vehicle industry is also grappling with a deep ethical dilemma: when an accident is unavoidable, should a car be programmed to kill those inside it, or pedestrians?
As it turns out, different countries have different preferences.
A global study by the Massachusetts Institute of Technology revealed that Chinese citizens are the most likely to spare drivers. People from Japan and Norway placed greater emphasis on sparing pedestrians.