Siri, wake up

Software

Let me begin by clearing something up: I’m not saying Siri is a useless interface. It isn’t, and has great potential. But after many attempts at finding Siri a proper place in my day-to-day, and after much frustration, I just sort of forgot about it.

I’m not a user interaction guru, though I did my share of studying and analysing the subject. But from mere observation, I can say that in choosing the best interface, people have the tendency to follow the principle of ‘maximum output with minimum effort’, or the path of least resistance if you want. In my attempts at using Siri in a meaningful way, I rarely found it quicker than doing the same operation through the Multi-touch interface.

Interpolation on speech recognition

The way I see it, speech recognition has still some way to go. The first thing to improve is the reliability and accuracy in translating dictated speech into a written text. In my experience, the software available today for an English speaker seems to offer better results than other languages.

But the problems related to speech recognition aren’t limited to this. We should also take into account its practical use in the field. Speech recognition is not something really usable in public, for the obvious noise pollution and the acoustic chaos that would ensue, not to mention the rudeness: we already have to endure those with the habit of talking loudly on their mobile phones, inflicting their business and nonsense to the unfortunates nearby; imagine if anyone who carries around a computer or portable device in public places and public spaces started dictating stuff aloud. Then there’s the issue of data privacy. There are professions that require strict confidentiality when dealing with customer information. Not to mention trade secrets and, in general, a whole range of sensitive information that must not be seen or, in this case, overheard.

This aspect alone severely limits the use of speech recognition (at least for business). But even under the best conditions of use, while I’m not denying that there’s a certain beauty in being able to dictate text to the computer and seeing it recognised on the screen, I can’t stress enough one other aspect that sometimes speech recognition advocates forget: that a dictated and recognised block of text is still far from being a complete, finished document.

I doubt very strongly that, once dictated, a text is free of typographical errors and does not require any further correction or change, unless the author doesn’t care about the final quality of their writing. When people say that this kind of dictation is a timesaver, I think that’s more an impression than the reality. It is not enough that the computer recognises the dictation (and the dictated punctuation) and inserts capital letters where needed. The text must subsequently be adapted and corrected to ensure that it is in effect a written text and not the transcription of a soliloquy.

It isn’t a difficult test to do, even without voice recognition software: think of an email or a short-to-medium-length text you would like to write, and dictate it to a tape recorder, then transcribe what you said as you hear it. First, you will need to delete redundant conjunctions and pauses; then, since we are not robots, you’ll have to adjust the syntax and the connections between sentences. When speaking, it’s easy to produce syntactically suspended passages, anacolutha, expressions that are accepted in spoken language but improper in a written text. As you can see, a mere transcription is not enough: we have to produce an appropriately written document. With such a ‘post-production’ operation, I can’t really see all this productivity gain, nor any time saved over typing the document. (At least considering the current state of speech recognition technology). 

Unless of course you want to end up writing like you talk, which I don’t think is a nice prospect. My plumber uses a transcription service to send text messages with his phone, so he can dictate and send a text without using his hands. His text messages are intelligible, but syntactically rough, and with the odd mis-transcripted word. Here the use of speech recognition makes sense, because the scope is limited and what matters is the result, not the means to achieve it. An email or any longer text delivered with the same technique would be unacceptable.

Perhaps I’ll soon be proven wrong by some breakthrough in the industry that’s just around the corner, but in the meantime I’m really under the impression that speech recognition will continue to work only in specific, limited situations. This technology is certainly useful (just think of the valuable aid it represents for disabled people), but in my opinion there’s still a long way to go before it can make a significant impact in our everyday life as an efficient input alternative to the keyboard, let alone a replacement.

Back to Siri

On paper, Siri is an interesting tool. The ads Apple created sell this feature rather well. When I updated my iPad to iOS 6, I couldn’t wait to try it, and I started playing with it as soon as the updating process was finished. ‘Playing’ being the key word here. I tried all kinds of stuff with it. The silly questions. The weather for the weekend. Setting up timers and reminders. Asking for directions. Finding information. The result was a mix of fun, unexpected results, and frustration. Being multilingual myself, I tried Siri in Italian, English, and Spanish, but even with my mother tongue, Italian, Siri’s understanding of what was said or dictated was too erratic to be considered a reliable interface. The most useful way to use Siri would be in situations where your hands aren’t free or you can’t otherwise use the Multi-touch interface comfortably. 

In an ideal word you should trust Siri to understand the basic commands you’re speaking, without you having to constantly check the device to see whether such commands have been interpreted correctly or not. In an ideal world, how quickly or slowly you talk, and your distance from the iPhone or iPad’s microphone shouldn’t matter much, as far as speed and distance are within reasonable parameters. As it is, however, a successful interaction with Siri, no matter how simple the request, still depends on too many variables. 

When OS X Lion was introduced, I tried to familiarise with certain new trackpad gestures, but failed to find them quicker or more efficient than the good old keyboard shortcut. Take the ‘Show Desktop’ gesture: spread with thumb and three fingers. I have the F11 key assigned to do exactly that, and when I’m typing it’s just quicker to press one key rather than move the hand away from the keyboard and perform the trackpad gesture. Similarly, I tried to find a way to use Siri as it was meant to be used, but the only instance where it’s actually the fastest solution, interaction-wise, for me has been setting up timers while I cook. 

(I also tried iOS’s dictation feature, only available in English, and while it has been less disappointing than expected — something even more remarkable if you consider that I’m not a native English speaker — I can’t really picture myself using it for anything longer than a tweet or a very brief email message.)

At the moment, Siri also looks a bit left to its own devices (excuse the pun). It’s been introduced more than a year ago and at this point one should expect improvements in reliability and scope of application at the very least. Instead in the freshly-released iOS 6.1 we get this new ‘feature’ that lets you order movie tickets through Fandango, which in reality (from what I’ve heard) isn’t quite the seamless futuristic process one imagines. You have to download the Fandango app, and what Siri does is basically linking you to the requested movie in Fandango (provided Siri parses your spoken information correctly, that is).

I really hope Apple takes a decided step towards what to do with it. There’s really great potential here, and Siri could bring speech recognition to a whole new level in human-machine interaction. I’m not expecting a perfect interface as it’s often been portrayed in science fiction movies and series, where sophisticated mainframes were queried and instructed simply through speech commands. What I expect is to see Siri finally leaving this seemingly eternal beta status; what I expect at this point is to see some vision behind it, something more than let’s add this gimmicky feature and see what happens.

The Author

Writer. Translator. Mac consultant. Enthusiast photographer. • If you like what I write, please consider supporting my writing by purchasing my short stories, Minigrooves or by making a donation. Thank you!