At the end of Fuzzy User Interfaces, Nick Heer writes:
I implore you to not misread this; this is not a condemnation of Siri, Google Now, or any other contextually-sensitive or “personal assistant”-type software. It’s far better than it ever has been. But it will take continued patience from us and regular, noticeable improvements from the teams building this software for us to feel confident in its abilities.
I have criticised Siri in the past (here, for example), and while I certainly think it’s a useful tool — it may be of great assistance to disabled people, as I have witnessed myself — I have lost all my patience with it. I am multilingual, and I have repeatedly tried to use it in English, Spanish and Italian. The interactions have been mostly disappointing in all three languages. I’ve given Siri another chance every time I read or heard someone point out that ‘Siri has improved’, but it has always been a hit-or-miss scenario.
I am somewhat reminded of the infamous handwriting recognition of the Newton. There was fuzziness in that, too. The handwriting recognition drastically improved in NewtonOS 2.x as opposed to the truly hit-or-miss recognition when writing on a NewtonOS 1.x device, but with that interface I was more willing to be patient and adapt my handwriting to facilitate recognition because the whole process, despite the bouts of frustration, had less friction overall. Correcting the Newton while you’re writing on it with the stylus isn’t that much different than correcting yourself when you’re writing with a pen on paper and you mis-write or misspell a word.
Both with the Newton handwriting recognition and the Graffiti system in Palm OS, there’s a bit of training involved. With Graffiti, being a simpler method, you simply learn how to write the shorthand for each letter, number, symbol, and you’re reasonably certain that the Palm device will understand you. After minimal training, I can write on my IBM WorkPad making virtually no errors. With the Newton, things are more complicated, and over time I’ve learnt a few tricks in how I trace the letters so as to significantly reduce recognition errors. Today, 98% of whatever I write (in English) on my Newton MessagePad 2100 is correctly recognised. The fuzziness, the little unpredictability of this pen-based interface, is tolerable. At least it is for me. The advantage is that you reach a point where you’re writing on a Newton device at a speedy pace and in a natural way, and what you write is transcribed and digitised quickly enough.
But with Siri it’s different. Siri’s fuzziness, as an interface, is unacceptable. Siri’s raison d’être is assisting, is being helpful. And indeed, Siri is the kind of interface where, when everything works, there’s a complete lack of friction. But when it does not work, the amount of friction involved rapidly increases: you have to repeat or rephrase the whole request (sometimes more than once), or take the device and correct the written transcription. Both actions are tedious — and defeat the purpose. It’s like having a flesh-and-bone assistant with hearing problems. Furthermore, whatever you do to correct Siri, you’re never quite sure whether your correcting action will have an impact on similar interactions in the future (it doesn’t seem to have one, from my experience). Then, there’s always what I usually consider the crux of the matter when interacting with Siri: the moment my voice request is misunderstood, it’s typically faster for me to carry out the action myself via the device’s Multi-touch interface, rather than repeat or rephrase the request and hope for the best.
I don’t know the details of how Siri works behind the scenes. What would be great, in my opinion, is some kind of initial training, just like with handwriting recognition. In iOS 9, there’s a modicum of training when setting up ‘Hey Siri’. I think it would be interesting to develop a more extended training stage where, for example, you get to repeat certain phrases containing key phonemes, or the most common requests, so that Siri can better ‘understand’ how you talk by associating the sample words and phrases with the speech feedback you provide.
The question I keep returning to when thinking about the current state of Siri, however, is this: Is it worth all the effort on the user’s part? Nick writes:
One of the biggest challenges that the software must overcome in order to become better — where by better I mean can be used with confidence that they will not confuse “two” and “too” in a dictated text message — is that we need to keep using them despite their immaturity. And that’s a big request when they do, indeed, keep confusing “two” and “too”. The amount of times that Siri has butchered everything from text messages to reminders to even the simplest of web searches has noticeably eroded my trust in it.
Siri’s scope is still rather limited. What is the reward for my continued use of this technology despite its immaturity? That sometime in the future it’ll be able to properly write a text message or a reminder? Time is too precious a resource for me to keep trying to have Siri understand simple requests. Not only does the friction in interacting with this particular fuzzy interface have to disappear, but the scope, applications and usefulness of Siri must expand as well — it has to offer enough flexibility and reliability to engage the user. It has to offer more, to provide an advantage over performing the same tasks manually. Otherwise, I think it’s difficult to expect users to invest time and energy in something that still feels non-essential.