Amazon.com’s machine learning specialists uncovered a big problem: their new recruiting engine did not like women.

The team had been building computer programs since 2014 to review job applicants’ resumes with the aim of mechanising the search for top talent, said five people familiar with the effort. Automation has been key to Amazon’s e-commerce dominance, be it inside warehouses or driving pricing decisions.

The company’s experimental hiring tool used artificial intelligence (AI) to give job candidates scores ranging from one to five stars, much like shoppers rate products on Amazon, some of the people said.

“Everyone wanted this holy grail,” one of the people said. “They literally wanted it to be an engine where I’m going to give you 100 resumes, it will spit out the top five, and we’ll hire those.”

But by 2015, the company realised its new system was not rating candidates for software developer jobs and other technical posts in a gender-neutral way. That is because Amazon’s computer models were trained to vet applicants by observing patterns in resumes submitted to the company over  10 years.

Most came from men, a reflection of male dominance across the tech industry. In effect, Amazon’s system taught itself that male candidates were preferable. It penalised resumes that included the word “women’s,” as in “women’s chess club captain”. And it downgraded graduates of two all-women colleges, according to informed sources. They did not specify the names of the schools.

Amazon edited the programs to make them neutral to these particular terms. But that was no guarantee that the machines would not devise other ways of sorting candidates that could prove discriminatory, the people said.

The Seattle company disbanded the team by the beginning of last year because executives lost hope for the project, according to the people, who spoke on condition of anonymity.

Amazon’s recruiters looked at the recommendations generated by the tool when searching for new hires but never relied solely on those rankings, they said.

Amazon declined to comment on the recruiting engine or its challenges, but the company says it is committed to workplace diversity and equality. The company’s experiment, which Reuters is first to report, offers a case study in the limitations of machine learning. It also serves as a lesson to the growing list of large companies, including Hilton Worldwide Holdings and Goldman Sachs Group, looking to automate portions of the hiring process.

About 55% of US human resources managers said AI would be a regular part of their work within the next five years, according to a 2017 survey by talent software firm CareerBuilder. Employers have long dreamed of harnessing technology to widen the hiring net and reduce reliance on subjective opinions of human recruiters.

But computer scientists such as Nihar Shah, who teaches machine learning at Carnegie Mellon University, say there is still much work to do. “How to ensure that the algorithm is fair, how to make sure the algorithm is really interpretable and explainable — that’s still quite far off,” he said.

Amazon’s experiment began at a pivotal moment for the world’s largest online retailer. Machine learning was gaining traction in the technology world, thanks to a surge in low-cost computing power. And Amazon’s human resources department was about to embark on a hiring spree. Since June 2015, the company’s global headcount has more than tripled to 575,700 workers, regulatory filings show. So it set up a team in Amazon’s Edinburgh engineering hub that grew to about a dozen people.

Their goal was to develop AI that could rapidly crawl the web and spot candidates worth recruiting, the people familiar with the matter said. The group created 500 computer models focused on specific job functions and locations. They taught each to recognise about 50,000 terms that showed up on past candidates’ resumes. The algorithms learned to assign little significance to skills that were common across IT applicants, such as the ability to write various computer codes, the people said. Instead, the technology favoured candidates who described themselves using verbs more commonly found on male engineers’ resumes, such as “executed” and “captured”, one person said.

Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said. With the technology returning results almost at random, Amazon shut down the project, they said.

Other companies are forging ahead, underscoring the eagerness of employers to harness AI for hiring. Kevin Parker, the CEO of HireVue, a start-up near Salt Lake City, said automation was helping firms look beyond the same recruiting networks upon which they had long relied. His firm analyses candidates’ speech and facial expressions in video interviews to reduce reliance on resumes. “You weren’t going back to the same old places; you weren’t going back to just Ivy League schools,” Parker said. His company’s customers include Unilever and Hilton.

Goldman Sachs had created its own resume analysis tool that tries to match candidates with the division where they would be the “best fit”, the company said.

Microsoft’s LinkedIn, the world’s largest professional network, has gone further. It offers employers algorithmic rankings of candidates based on their fit for job postings on its site. Still, John Jersin, vice-president of LinkedIn Talent Solutions, said the service was not a replacement for traditional recruiters.

“I certainly would not trust any AI system today to make a hiring decision on its own,” he said. “The technology is just not ready yet.”

Some activists say they are concerned about transparency in AI. The American Civil Liberties Union is challenging a law that allows criminal prosecution of researchers and journalists who test hiring websites’ algorithms for discrimination. “We are increasingly focusing on algorithmic fairness as an issue,” said Rachel Goodman, a staff attorney with the Racial Justice Program at the ACLU.

Still, Goodman and other critics of AI acknowledged it could be exceedingly difficult to sue an employer over automated hiring: job candidates might never know it was being used.

As for Amazon, the company managed to salvage some of what it learned from its failed AI experiment. It now uses a “much-watered down version” of the recruiting engine to help with some rudimentary chores, including culling duplicate candidate profiles from databases, one of the people familiar with the project said. Another said a new team in Edinburgh had been formed to give automated employment screening another try, this time with a focus on diversity.