subscribe Support our award-winning journalism. The Premium package (digital only) is R30 for the first month and thereafter you pay R129 p/m now ad-free for all subscribers.
Subscribe now
The 11 official languages on display at the Constitutional Court. Picture: SABC
The 11 official languages on display at the Constitutional Court. Picture: SABC

Lost in the flurry of geek excitement about Google’s recent announcement of a “cloud region” in South Africa was a footnote that will arguably have a much greater effect on the lives of ordinary people in the country.

Google linguist Sandy Ritchie attended the Google for Africa online gathering to announce that the “Gboard” — the Google keyboard, available in a Play Store near you — can now be used for voice typing in eight South African languages, in addition to the English, Afrikaans and Zulu already available.

This is just the start, though. Elsewhere in the Google ecosystem — the YouTube search bar, for example, as well as auto-transcription on videos, the Google Docs dictation feature and, probably most importantly, Google Translate — all South Africans will soon be able to talk in their first language, confident that the tech will listen, understand and do its thing.

Many of us, especially English speakers who have had access to technology for decades, have become blasé about such matters. “We’ve grown up with tech, we have a set of digital skills and we’re used to it,” says Ritchie, a Scot based in New York. 

“But for people who speak smaller languages, and whose literacy levels are not as high, speech-to-text can be a transformative change in their lives. Enabling voice recognition can be the difference that finally gives people access to the web,” Ritchie says.

Enabling voice recognition can be the difference that finally gives people access to the web
Sandy Ritchie

Google has the resources to throw at this, and if you read Ritchie’s paper describing his team’s efforts, you can see why they’re needed.

For just one of Africa’s 2,000-plus languages, Kinyarwanda, which is spoken by 99% of Rwanda’s 13-million people, Google started with 2,000 hours of recordings. For all 15 languages in Ritchie’s latest project, the team even scoured YouTube and transcribed the examples they found.

Once they had gathered 4,200 hours of recordings containing 3.8-million utterances, they wheeled out their weapon of choice: machine learning. This is where the science becomes impenetrable, but the big picture is simple: the Google team learnt what works best for each language, and that knowledge will help them improve their results.

“In all the datasets, there are issues with spelling variation, and variation in the use of diacritics and special characters,” says Ritchie’s paper.

The use of high, mid, low and falling tones also challenged the Google servers’ ability to develop their own intelligence about the languages, as did clicks and the consonants classified as dentals, implosives and glottal stops.  

Tswana’s “marginal clicks” did not stop that language from becoming the one that scored best when Ritchie’s team evaluated the word error rates of four different machine-learning systems. Bottom of the league, with an error rate about 40 times higher, was Yoruboid, a 14-strong family of languages spoken by about 21-million people in Nigeria, Togo, Ghana and Benin.

The score for Tswana is made more remarkable by Ritchie’s team having had only 53 hours of speech and 56,000 utterances to work with. About 80% of this was used to train the machine learning systems; the rest was for testing.

In common with the other South African languages in the study, this data came from the South African Centre for Digital Language Resources (SADiLaR), based on North-West University’s Potchefstroom campus. SADiLaR’s recordings include 56 hours of high-quality utterances in each South African language that were gathered a decade ago by the National Centre for Human Language Technology.

SADiLaR director Langa Khumalo says the centre’s job is to achieve a “parity of esteem” among South Africa’s 11 official languages, and its work with Google has taken it a step closer to this target. It is also an important case study, he says, that shows the importance of investment in language resources.

“On a very practical note it can allow for more accessible interfaces for people living with disabilities, or expand what is possible using a web service and a smartphone,” Khumalo says.

SADiLaR’s node at the Council for Scientific & Industrial Research (CSIR) in Pretoria is making its own automatic speech recognition breakthroughs and has an Android text-to-speech app, Qfrency, with male and female voices in all 11 official languages.

subscribe Support our award-winning journalism. The Premium package (digital only) is R30 for the first month and thereafter you pay R129 p/m now ad-free for all subscribers.
Subscribe now

Would you like to comment on this article?
Sign up (it's quick and free) or sign in now.

Speech Bubbles

Please read our Comment Policy before commenting.