My everyday gripes with Google and Amazon voice assistants
Some technologies making their way into everyday life seem magical. But what is the point if they are not reliable in helping you do things? "Conversational computing" using smartphones, speakers and other devices is all the rage in tech circles and is finding its way into consumer gadgets. However I found it a recipe for frustration ... until, that is, I needed it.
Having one arm in a sling (it has been taking post-operative downtime after losing a battle with the California surf) can lead to a new appreciation. Suddenly I am blessed with the patience to talk to computers.
I have been living with Google’s voice-response Assistant in a Pixel smartphone and Amazon’s Alexa in an Echo speaker. Like precocious children, they can shock you with preternatural awareness but then rapidly let you down. The trick is not to expect too much.
The good news is that two big problems with voice computing are likely to be solved soon. The first is getting a computer to understand you, which requires a combination of speech recognition (translating the sounds you make into words) and natural language understanding (divining the meaning of your utterance).
Watching Assistant type as you talk is a good way to appreciate how far technology has advanced. Start speaking and it will probably misunderstand some words but as the sentence progresses it works out the meaning and shuffles the words to fit.
Say, "Show me the way to San Jose," for instance, and the software begins to type "do you know anything to sign…". Then it hears the final word and, almost miraculously, the sentence resets. (Assistant’s response: "I can get you directions", followed, with a nod to the Dionne Warwick song, by the cute: "Hopefully you’ll find peace of mind." )
Another benefit is being able to say the same things in many ways and be understood, rather than hunting for a formulation the software will understand. After a while, this induces a psychological breakthrough: you relax and talk naturally; you no longer try to shoehorn your thoughts into a kind of linguistic Fortran programming language.
The second problem besetting voice computing has been the ability to understand context. Computers may find it easy to answer "Who is the president?" but would be lost with a follow-up question such as "How old is he?" The latest systems are overcoming this limitation.
Alexa is less adept at this than Assistant. The Amazon device is effective at one-shot responses to the most common phrases you are likely to say, and is similar to a search engine that returns a precise answer.
Assistant, on the other hand, keeps a sense of context. After "Where is Oakland zoo?" you can ask "When does it close?" Answer: "It’s closed right now but will be open tomorrow from 10am to 4pm". "How much does it cost?" returns ticket prices from a zoo website.
Other "intelligent assistants" are enjoying success, too, including Hound, the product of 10 years of research into language understanding. All the assistants are developing constantly.
In my tests, Google handled the widest range of questions correctly but this was far from comprehensive. Not everything turns out well. Like a toddler that missed its nap, the software can sometimes seem to take perverse pleasure in refusing to understand you. When you say "telephone" to Assistant, for instance, why would the correct word appear and then mysteriously turn into "elephant"? And why, when it asks you to pick from a list and you choose "the second one", would it think you said "psycho mom"?
Having only one working arm, however, it is easy to forgive such glitches. Trying two or three times, sometimes changing the words used, usually gets the meaning across.
If the spoken word is crossing a threshold in the understanding between man and machine, how useful is it for getting things done, and connecting to apps and services?
Alexa is the gold standard for the voice-only world and companies are racing to develop apps (Amazon calls them "skills") that work with it. With the Uber skill you can summon a car; with the Starbucks skill you can order a drink.
All too often, though, the linkages are not seamless. As an example, Google is trying to turn Assistant into an easy way to access apps on its Pixel phone. Ask for "photos of Ella" and it says it cannot find them in Google Photos. At the same time it offers a button to connect to the app: tap that and just the pictures of Ella appear, identified by facial recognition software.
Alexa also has teething troubles. Connect your Starbucks account to the device, say "order me a Starbucks", and the software takes a curious turn. Instead of connecting to the "skill" and ordering your favourite drink from the nearest store, it offers to put an unspecified product from Starbucks in your Amazon cart (I said "yes", and was told 12 cans of doubleshot espresso had been added).
These snags all feel like teething problems. For now, they seriously affect the usefulness of voice services unless you are happy to make the same request several times, using different words, until you get the right response.
The other great use for voice is to replace the need for a keyboard, meaning you can easily dictate text messages, e-mails and longer written documents. The news here is also mixed. Short messages or informal e-mails — ones where punctuation is unimportant and the odd wrong word may be overlooked — are quick and easy.
Anything more considered is a different matter. Incorrect words and poor punctuation litter the final product. If it was easy to edit, this would not matter too much. A small screen is a difficult place to correct words and most people’s fingers are too fat to perform surgery on punctuation.
Frustrations like these are guaranteed to turn the able-bodied back to keyboards and touchscreens. With only one functional arm, my calculation is different — but I still cannot wait to shed the sling and return to man-and-machine interfaces that leave no room for misunderstanding.
© The Financial Times Limited 2017