A few thoughts on Apple Vision Pro

Apple’s WWDC23 keynote presentation was going so well. I wasn’t loving everything, mind you. The new 15-inch MacBook Air didn’t impress me and I still believe it’s an unnecessary addition that further crowds the MacBook line. And the new Mac Pro feels like a product Apple had to release more than a product Apple wanted to release. But I’m happy there was a refresh of the Mac Studio, and I’m okay with the new features in the upcoming iOS 17, iPadOS 17, and Mac OS Sonoma. Nothing in these platforms excites me anymore, but at least they’re not getting worse (than expected).

Then came the One More Thing. Apple Vision Pro. My reaction was viscerally negative. Before my eyes, rather than windows floating in a mixed reality space, were flashing bits of dystopian novels and films. As the presentation went on, where the presenter spoke about human connection, I thought isolation; where they spoke about immersion, I thought intrusion. When they showed examples of a Vision Pro wearer interacting with friends and family as if it was the most normal thing to do, the words that came to my mind were “weird” and “creepy”.

In the online debate that followed the keynote, it appears that we Vision Pro sceptics already worried by the personal and societal impact of this device have been chastised by the technologists and fanboys for being the usual buzzkills and party poopers. And while I imparted no judgment whatsoever on those who felt excited and energised by the new headset, some were quick to send me private messages calling me an idiot for not liking the Vision Pro. Those, like me, who were instantly worried about this device bringing more isolation, self-centeredness, and people burying themselves even more into their artificial bubbles, were told that we can’t possibly know this is what’s going to happen, that this is just a 1.0 device, that these are early days, that the Vision Pro is clearly not a device you’re going to wear 24 hours a day, and so forth.

Perhaps. But our worries aren’t completely unfounded or unwarranted. When the iPhone was a 1.0 device, it offered the cleanest smartphone interface and experience at the time, and while it was the coolest smartphone, it was essentially used in the same ways as the competition’s smartphones. But in just a matter of few years its presence and usage have transformed completely, and while I won’t deny its usefulness as a tool, when you go out and look around you, and see 95% of the people in the streets buried in their smartphone, it’s not a pretty sight. If Vision Pro turns out to be even half as successful as the iPhone, somehow it’s hard for me to imagine that things are going to get better from a social standpoint.

Let’s focus on more immediate matters

All of the above stems from my initial, visceral reaction. And even though it can be viewed as wild speculation surrounding a product that won’t even be released before 2024, I think it’s worth discussing nonetheless.

But as the Vision Pro presentation progressed, and I had finally managed to control the impulsive cringing, I started wondering about more technical, practical, and user-experience aspects of the headset.

User interface and interaction

If I had to use just one phrase to sum up my impressions, it probably would be, Sophisticated and limited at the same time. There’s visual elegance and polish, that’s undeniable. All those who have actually tried the headset unanimously praise the eye-tracking technology, saying that it essentially has no latency. Good, because any visual lag in such an interface would break it immediately. Eye-tracking is the first of five ways to interact with objects in visionOS. You highlight an object or UI element by looking at it. Then you have the pinching with your thumb and index finger to select the object. Then you have pinching then flicking to scroll through content. Then you have dictation. Then you have Siri to rely on when you want to perform certain actions (good luck with that, by the way). That’s it.

First concern: Since Apple is trying to position Vision Pro as a productivity device, more than just another VR-oriented headset aimed at pure entertainment, I struggle to see how really productive one can be with such rudimental interaction model. It’s simultaneously fun and alarming to watch what Apple considers productive activities in their glossy marketing material. Some light web browsing, some quick emailing, lots of videoconferencing, reading a PDF, maybe jotting down a note, little else. On social media, I quipped that this looks more like ‘productivity for CEOs’. You look, you read, you check, you select. You don’t really make/create. It feels like a company executive’s wet dream: sitting in their minimalistic office, using nothing more than their goggles. Effortless supervision.

Second concern: Feedback. Or lack thereof. It’s merely visual, from what I can tell. Maybe in part auditory as well. But it’s worse than multi-touch. At least with multi-touch, even if we are not exactly touching the object we’re manipulating, we’re touching something — the glass pane of an iPhone or iPad, or a laptop screen. At least there’s a haptic engine that can give a pretty good tactile illusion. In the abstract world of Vision Pro, you move projections, ethereal objects you can’t even feel you’re really touching. There is even a projected keyboard you’re supposed to type on. Even if you never tried the headset, you can do this quick exercise: imagine a keyboard in front of you, and just type on it. Your fingers move in the air, without touching anything. How does it feel? Could you even type like this for 10 minutes straight? Even if you visually see the projected keyboard as a touchable object that visually reacts to your air-typing (by highlighting the air-pressed air-key), it can’t be a relaxing experience for your hands. And typing is a large part of so many people’s productivity.

Sure, it seems you can use a Bluetooth keyboard/mouse/gamepad as input methods, but now things get awkward, as you constantly move between a real object and a projected window/interface. Of all the written pieces and video essays on Vision Pro I’ve checked, Quinn Nelson’s has been the most interesting to me and the one I felt more in agreement with, because he expresses similar concerns as mine when it comes to user interface and use cases for the headset. On this matter of using traditional input devices such as keyboard, mouse, gamepad, Quinn rightly wonders:

How does a mouse/cursor work in 3D space? Does it jump from window pane to window pane? Can you move the cursor outside of your field of view? If you move your head, does it re-snap where your vision is centered?

I’ll be quoting Quinn more in my article, as he has some interesting insights.

Third concern: Pure and simple fatigue. “Spatial computing” is a nice-sounding expression. And as cool and immersive as browsing stuff and fiddling with 2D and 3D objects in an AR environment is, I wonder after how long it becomes overwhelming, distracting, sensory-overloading, fatiguing. Having to scan a page or an AR window with your eyes with intent because your eyes are now the pointer I imagine is more tiring than doing the same with a mouse or similar input devices on traditional, non-AR/VR environments.

The misguided idea of simplifying by subtracting

A few days ago I wrote on Mastodon:

The trend with UI in every Apple platform, including especially visionOS, is to simplify the OS environment instead of the process (the human process, i.e. activity, workflow). On the contrary, this fixation on simplifying the interface actually hinders the process, because you constantly hunt for UI affordances that used to be there and now are hard to discover or memorise.

I admit, maintaining a good balance between how an user interface looks and how it works isn’t easy. Cluttered and complex is just as bad as Terse and basic. But it can be done. The proof are many past versions of Mac OS, and even the first iOS versions before iOS 7. How you handle intuition is key. In the past I had the opportunity to help an acquaintance conduct some UI and usability tests with regular, non-tech people. I still remember one of the answers to the question “What makes an interface intuitive for you?” — the answer was, When, after looking at it, I instantly have a pretty good idea of what to do with it. Which means:

Buttons that look like buttons;
Icons that are self-explanatory;
Visual clues that help you understand how an element can be manipulated (this window can be resized by dragging here; if I click/tap this button, a drop-down menu will appear; this menu command is dimmer than the others, so it won’t do anything in this context; etc.);
Feedback that informs you about the result of your action (an alert sound, a dialog box with a warning, an icon that bounces back to its original position, etc.);
Consistency, which is essential because it begets predictability. It’s the basis for the user to understand patterns and behaviours in the OS environment, to then build on them to create his/her ‘process’, his/her workflow.

Another intriguing answer from that test was about tutorials. One of the participants wrote that, in their opinion, a tutorial was a “double-edged sword”: On the one hand, it’s great because it walks you through an unfamiliar application. On the other, when the tutorial gets too long-winded, I start questioning the whole application design and think they could have done a better job when creating it.

This little excursion serves to illustrate a point: Apple’s obsession with providing clean, sleek, good-looking user interfaces has brought a worrying amount of subtraction in the user interface design. By subtraction I don’t necessarily mean the removal of a feature (though that has happened as well), rather the visual disappearance of elements and affordances that helped to make the interface more intuitive and usable. So we have:

Buttons that sometimes don’t look like buttons;
UI elements that appear only when hovered over;
(Similar to the previous point) Information that remains hidden until some kind of interaction happens;
Icons and UI elements that aren’t immediately decipherable and understandable;
Inconsistent feedback, and general inconsistency in the OS environment: you do the same action within System App 1 and System App 2, and the results are different. Unpredictability brings confusion, users make more mistakes and their flow is constantly interrupted because the environment gets in the way.

Going from Mac OS to iOS/iPadOS to visionOS, the OS environment has become progressively more ‘subtractive’ and abstracted. The ways the user has to interact with the system have become simpler and simpler, and yet somehow Apple thinks people can fully utilise visionOS and Vision Pro as productively as a Mac. Imagine for a moment to try out Vision Pro for the first time without having paid much attention to the marketing materials and explanatory pages on Apple’s website. Is the OS environment intuitive? Do you, after looking at it, have a pretty good idea of what to do with it? My impression is that it’s going to feel like the first swimming lesson: you’re thrown into the water and you start moving your limbs in panic and gasping for air. Immersion and intuition can go hand by hand, but from what I’ve seen, it doesn’t seem to be the case in Vision Pro. But it’s a new platform, of course you need a tutorial!, I can hear you protest. I saw regular people trying the iPhone when it was first publicly available. I saw regular people trying the iPad when it was first publicly available. I saw regular people trying the Apple Watch when it was first publicly available. They didn’t need a guided tour. Maybe a little guidance for the less evident features, but not for the basics or for finding their way around the system.

What for? Why should I use this?

Back to Quinn Nelson’s video, at a certain point he starts wondering about the Vision Pro’s big picture, much in the same way I’ve been wondering about it myself:

The problem is that, with no new experiences beyond “Can you imagine??”, Apple is leaving the use cases for this headset to developers to figure out.

Look, you might say, “Hold on! Watching 3D video in a virtual movie theatre is cool! Using the device as an external display for your Mac is great! Browsing the Web with the flick of a finger is neat! And meditating through the included Mindfulness app is serene”. If these things sound awesome, you’re right. And congratulations, you’re a nerd like me, and you could have been enjoying using VR for, like, the last five years doing these same things but just a little bit worse.

There wasn’t a single application shown throughout the entirety of the keynote — not one — that hasn’t been demoed and released in one iteration or another on previous AR/VR headsets.

VR isn’t dying because hand tracking isn’t quite good enough. No. The problem with these devices is that they require intentionality and there’s no viable use case for them. It’s not like the iPhone, that you can just pick up for seconds or minutes at a time.

Maybe the SDKs and frameworks that Apple is providing to developers will enable them to create an app store so compelling that their work sells the device for Apple, much like the App Store did for the iPhone. But hardware has not been the problem with VR. It hasn’t been for years. It has been the software.

I expected to see a Black Swan, a suite of apps and games that made me think, “Duh! Why has nobody thought of this before!? This is what AR needs”. But there really wasn’t much of anything other than AR apps that I already have on my iPhone and my Mac, and that I can use without strapping on a headset to my face making myself look like a dick and spending $3,500 in the process! I hope this is the next iPhone, but right now I’m not as sure as I thought I’d be.

Apologies for the long quote, but I couldn’t have driven the point home any better than this. As Quinn was saying this, it felt like we had worked together on his script, really. The only detail I’m not in agreement with Quinn is that I hope Vision Pro won’t be the next iPhone. A lot of people seem to buy into the idea that AR is the future of computing. I’m still very sceptical about it. In my previous piece, in the section about AR, I wrote:

I am indeed curious to see how Apple is going to introduce their AR goggles and what kind of picture they’re going to paint to pique people’s interest. I’m very sceptical overall. While I don’t entirely exclude the possibility of purchasing an AR or VR set in the future, I know it’s going to be for very delimited, specific applications. VR gaming is making decent progress finally, and that’s something I’m interested in exploring. But what Facebook/Meta and Apple (judging from the rumours, at least) seem interested in is to promote use cases that are more embedded in the day to day.

As effortless as Apple went to great lengths to depict it, I still see a great deal of friction and awkwardness in using this headset as a part of the day to day routine. And I don’t mean the looking like a dork aspect. I mean from a mere utility standpoint. To be the future of computing, this ‘spatial computing’ has to be better than traditional computing. And if you remove the ‘shock & awe’ and ‘immersion’ factors, I don’t see these great advantages in using Apple’s headset versus a Mac or an iPad. It doesn’t feel faster, it doesn’t feel lighter, it doesn’t feel more practical, or more productive. It looks cool. It looks pretty. It makes you go ‘wow’. It’s shallow, and exactly in tune with the general direction of Apple’s UI and software these days.

Another surprisingly refreshing take on the Vision Pro came from Jon Prosser. In his YouTube video, This is NOT the future — Apple Vision Pro, Jon speaks of his disappointment towards the headset, and makes some thought-provoking points in the process. Here are some relevant quotes (emphasis mine):

First impressions really matter, especially for an entirely new product category. It is Apple’s responsibility to tell us why and how this matters. Tech demos, cool things, shiny new thing aside, that is their actual job. Apple isn’t technology. Apple is marketing. And that’s what separates them from the other guys. When we take the leap into not only an entirely new product category but a foreign product category at that, it’s Apple’s responsibility to make the first impression positive for regular people.

VR and AR is already such a small niche little market. Comparing AR/VR products against Apple’s Vision Pro is nearly pointless because the market is so small that they might as well be first to their user base. It’s not about comparing Vision Pro versus something like the Meta Quest, because if you compare them of course it’s not even close. Apple dumped an obscene amount of resources into this project for so many years and are willing to put a price tag on it that Zuckerberg wouldn’t dare try. Apple needed to go on stage and not just introduce people to a mixed reality product. Apple needed to go on stage and introduce those people to mixed reality, period.

I think for once in a very long time — especially with a product announcement or announcement at all — Apple came across as… confused; wildly disconnected and disassociated from their users. People. The way Apple announced Vision Pro, the way they announce any product, is by showing us how they see us using it. And what did they show us for this? We saw people alone in their rooms watching movies, alone in their rooms working. It’s almost like they were, like, Hey, you know that stuff you do every day? Yeah, you still get to do that but we’re gonna add a step to it. You’re welcome! Oh but it’s big now! It looks so big! If this was any other product at any other price tag from any other company, sure, those are cool gimmicks, I’ll take them. Apple doing them? I’m sure they’re gonna be with an incredible quality. Wow, amazing. But… is that really life-changing?

I want to make this clear: I do not doubt, even a little bit, that Vision Pro is revolutionary. It’s looking to be objectively the best, highest-fidelity, AR and VR experience available on the entire planet. This is completely over-engineered to hell. It is technologically one of the most impressive things I have ever seen. But are we really at the point where we’re just gonna reward [Apple] for just… making the thing? […] It doesn’t matter how hard you work on a thing. That is not enough if it doesn’t fit into other people’s lives. Apple has always been about the marriage of taking technology and making it more human, letting boundaries fade away, and connecting people to the experience of using those devices, bridging gaps between people by using technology. And with Vision Pro… it feels like Apple made something that is entirely Tech first, Human last.

It’s not the idea that matters. It’s the implementation. The idea will only ever be as good as the implementation. […] If this mixed reality vision is truly Apple’s end goal, and the things they showed us on stage are the things that they want us to focus on — if those things are all that this first-gen product was mainly ever meant to do, then they put this in way too big of a package.

If this was more of a larger focus on VR and gaming and putting you someplace else, like the Quest products, then yeah, I’m fine with wearing this massive thing on my face. But they demoed a concept that works way better with a much smaller wearable, like glasses maybe. First-gen product, again, yeah I know. But also, again, first impressions matter. They introduced the future of Apple, the company after the iPhone, with this dystopian, foreign, disconnected product. […] They expect you — according to all this — to live in this thing. Countless videos of people just… actually living in it. […] This is a technological masterpiece, but this isn’t our iPhone moment. This isn’t our Apple Watch moment.

Another interesting aspect Prosser emphasises — a detail I too did notice during the keynote but something I didn’t think much of at the time — is that you don’t see any Apple executive wearing the headset. Again, this could be just a coincidence, but also a bit of a Freudian slip — a little subliminal hint that reveals they want to actually distance themselves from this product. Almost like with the Apple silicon Mac Pro, Vision Pro feels like a product Apple had to release more than a product Apple wanted to release. Make of this detail what you want, but let me tell you: if Vision Pro had been a Steve Jobs’s idea and pet project, you can bet your arse he himself would have demoed it on stage.

Again, apologies for the massive quoting above, but I couldn’t refrain from sharing Quinn Nelson and Jon Prosser’s insights because they’re so much on the same page as my whole stance on this device, it hurts.

I’ll add that, product direction-wise, I see a lot of similarities between Vision Pro and the iPad. Something Apple produces without a clear plan, a clear… vision. In both cases, Apple introduces some device propelled by an initial idea, a sort of answer to the question, What if we made a device that did this and this?, but then the whole thing loses momentum because the burden of figuring out what to do with such device, how to fit it in daily life, is immediately shifted to developers and end users. One of Cook’s Apple most favourite phrases is, We can’t wait to see what you’ll do with it! It sounds like a benevolent encouragement, like you’re being invited into the project of making this thing great. But what I see behind those trite words is a more banal lack of ideas and inspiration on Apple’s part. And it’s painful to see just how many techies keep cutting Apple so much slack about this. It’s painful to see so many techies stop at how technologically impressive the new headset is, but very few seem interested to discuss whether the idea, the vision behind it, is equally impressive. People in the tech world are so constantly hungry for anything resembling ‘progress’ and ‘future’ that they’ll eat whatever well-presented plate they’re given.

“AR is the future” — but why?

I see AR and VR as interesting developments for specific activities and forms of entertainment. Places you go for a limited amount of time for leisure. From a user interface standpoint, I can’t see how a person would want to engage in hours-long working sessions in a mixed-reality environment. The interaction model is rudimentary, the interface looks pretty but pretty is not enough if there’s less intuitiveness and more fatigue than using a Mac or an iPad or an iPhone. Everything that Apple has shown you can do with Apple Vision Pro, every use case they proposed, it’s something I can do faster and more efficiently on any other device. I don’t think that replicating the interface of iOS, iPadOS and Mac OS by projecting it on a virtual 3D space is the best implementation for an AR/VR device. It makes for a cool demo. It makes you look like you finally made real something we used to see in sci-fi shows and films. But in day-to-day sustained use, is it actually a viable, practical solution? Or is it more like a gimmick?

I see the potential in AR/VR for specific things you can really enjoy being fully immersed in, like gaming and entertainment, and even for some kind of creative 3D project making. But why should ‘being inside an operating system’ be the future of computing? What’s appealing about it? Perhaps my perspective is biased due to the fact that I’m from a generation that knows life before the Web, but I always considered technological devices as tools you use, and the online sphere a place you go to when you so choose. So I fail to see the appeal of virtually being inside a tool, inside an ecosystem, or being constantly online and connected into the Matrix. An operating system in AR, like visionOS, still feels like the next unnecessary reinvention of the wheel. You’re still doing the same things you’ve been doing for years on your computer, tablet, smartphone — not as quickly, not as efficiently. But hey, it looks cool. It makes you feel you’re really there. It’s so immersive.

And that’s it. What a future awaits us.

Riccardo Mori

Writer & Translator