June 2023

Building for the Vision Pro

Six considerations for getting ahead with spatial computing

I am hugely excited about the Vision Pro and all the potential for new products and business opportunities it presents. I believe we are at a key point in the history of spatial computing: I see it like the early days of the iPhone, when we had a ground-breaking new computing device creating infinite new possibilities. Of course, it took developers like me time to nail down the use cases for and take advantage of those opportunities.

To get ahead this time, I have compiled my thoughts and understanding from hours of WWDC footage on how Apple expects us to build apps for visionOS.

This post will be a useful starting point for anyone who wants a head start in building for the Apple Vision Pro, either creating a new product or optimising an existing app for spatial computing.

The good news is you won’t need to start from zero. In a lot of ways visionOS feels like a fork of iPadOS: it has support out the box for iPad apps without any code changes, similar to Mac Catalyst, it also has familiar design patterns such as tab based navigation as well as "list & detail" layout, and its software keyboard supports command menus.

Unfortunately, it's not just a matter of adding xrOS (visionOS as it's referred to in code) as a run destination in Xcode, building and archiving, publishing to the App Store and calling it a day - while you can do that you would not be taking full advantage of what the platform has to offer by elevating it for spatial computing and really make your app sing on visionOS!

Written with developers and technical product people in mind, here are six key considerations for building spatial apps on the Apple Vision Pro. I hope it gets your creative juices flowing.

1. Building with a third spatial dimension means considering how users will interact with your app early

The most obvious consideration is the fact you now have a third spatial dimension to work with, so you need to figure out ways your app (or "vision" for an app - pun unintended) will best make use of this to create a great experience for the user.

There are 3 supported scene types on visionOS:

Window - Your standard 2D view you are used to from macOS and iOS, albeit with a couple of extra features such as being pseudo-3D with augmented UI elements on the vertical 2D plane can pop out towards you on the z-axis. Good for displaying menus and controls to the user as well as textual content.

Volume - Essentially the 3D equivalent of the Window, you can display 3D objects on a resizable horizontal 2D plane. Useful for showcasing objects like items in e-commerce or getting a sense of scale for an engine (for instance) you've been working on in CAD software.

Space - This is when your app takes over the user's full field of view hiding any other apps from the user, and is fully immersive so Apple only recommends your app uses this scene type when it absolutely makes sense and provides the user with clear options for entering/leaving your app's space. Great for media consumption like films and games where you want the user to be fully immersed in the content.

All 3 are SwiftUI Scenes so your app can use a combination of Windows and and Volumes at once while also offering a Space to "fully immerse" the user in your app. It is important to understand when and where your app should be using each to provide the best possible experience for your users.

2. Selecting the right building blocks


visionOS has been built from the ground up with SwiftUI, which provides all the familiar UI elements and layouts. Apple has been pretty headstrong about encouraging developers to use it as their UI framework making it the development environment of choice over the older UIKit.

UIKit is still supported on visionOS however it will be treated as a second class citizen, you will lose some behaviours as to place 3D content in your view means embedding a RealityView from SwiftUI, not to mention tools like storyboards have been completely deprecated and you most likely will be locked out of certain features especially in future updates and will have to bridge to SwiftUI anyway.

This is a reasonable move by Apple as SwiftUI has already proven its worth as a mature and reliable choice for building production quality apps since iOS 14. So, if you want to stay ahead of the curve, futureproof your app's codebase and provide your users with the best possible experience, it is strongly recommended you build for visionOS with SwiftUI.

This is great news if you have an existing app written in SwiftUI as you can design for visionOS with minimal code changes, as SwiftUI is designed to be used across all of Apple's platforms. This also means if you develop an app for visionOS you will also have an easy time making it cross-platform for other Apple devices too.

RealityKit and ARKit

On top of SwiftUI, there are two key frameworks you need to use for building a visionOS app: ARKit and ReailtyKit. They have both been around for years now but unless you were one of the early adopters that has already made an AR app for iOS you would have had little reason to touch either one. For visionOS, both are essential to use and understand.

What's the difference between the two and why is each one important?

RealityKit is the rendering engine for the 3D objects and content you want to use in your environment. You will be able to make custom 3D objects with Reality Composer Pro to import into your app and use an industry standard format called MaterialX for geometry shaders (the only good thing to come out of the making of the new Star Wars trilogy).

ARKit handles all the heavy lifting when it comes to world mapping, persistence, matting, segmentation, and environment lighting. It handles the blending of the real and virtual worlds.

It's definitely worth understanding at least to a high level these frameworks and what they are responsible for to reduce friction when communicating with engineers, allowing for more rapid prototyping and development.

3. A whole new world (of opportunities) for collaboration

One of the observations/concerns that the tech media drew from last week's keynote was the fact that the headset is quite isolationist; the user is mostly alone. However, Apple has thought hard about how to make VisionOS apps collaborative, such as by offering "spatial personas" (the moving image of yourself that others see while in SharePlay).

I believe the opportunities in this area will be rich as one of the big selling points for AR/VR is the sense of presence with a group of people when you are not present physically in the corporeal world. Meta has gone all in on this aspect of VR and for a good reason, however unlike Meta this will just be one use of the Vision Pro among many others.

SharePlay for spatial apps comes with a few templates that you can use to maintain spatial consistency for your content, and the system will handle this for you:

This is cool because different users on different devices can have a single coordinate system, with shared size, position, and orientation of your app's scene (called “shared context”) and each instance of your app can display visuals, UI, and audio differently per device. For example, everyone can consume the same film, at the same time, with their own volume and subtitle preferences, personalised for comfort.

While the system does handle this for you, it falls to your app to handle visual consistency, i.e. making sure the content and placement is in sync for all users such as synchronising the scroll position of a document when one user scrolls if the participant is "spatial".

Apple has added a system coordinator to manage state in a SharePlay session (such as checking if the participant is spatial and handling that appropriately) and any additional configurations you want in the session, like your template preference.

Something else that is noteworthy is that each user can also open their own personal windows if they wanted to take notes, for instance, of a shared presentation. You will need to specify whether a scene is shared or not shared in code.

4. Crafting an immersive spatial web experience

The first thing to note here is that Safari on visionOS detects the interactivity of page elements via the styles sheet (CSS), so it is important to make sure that’s set up correctly and there is no pointer or cursor in visionOS so you need to have some sort of highlighting to indicate visually to the user that the element is interactive when their eyes are focusing on it.

Also with visionOS, you can embed your 3D assets on your webpage by adding "ar" to the rel parameter of an anchor tag (a link) that points to your 3D model’s asset file in HTML to present an AR Quick Look (an Apple feature for previewing files) of the 3D model straight from your website, all while reaping the benefits of the advanced rendering RealityKit has to offer.

Some sites already make use of AR on the web such as this site for making Dune AR avatars, but Apple wants to take this a step further by popping up their own native modal AR Quick Look rather than an embedded view finder in Safari.

The issue is that this preview is opened separately to the webpage, in isolation. Apple is proposing a new HTML element called <model> that allows you to embed 3D models directly into your webpage, like Quick Look but actually inside of your webpage. Currently, it is a feature flag that the user has to manually turn on in the settings of the latest Safari, so you can still play around with it even if it does not get adopted as a web standard.

VisionOS will also support WebXR, a web standard that's great for creating fully immersive web apps. It's currently in developer preview on Safari. WebXR uses WebGL, a well-established web rendering engine since 2011. This means you'll have access to a wide array of development tools and a vibrant developer community.

5. 3D assets unveiled: what you need to know

A file format worth getting familiar with is the Universal Scene Description file (or USD file) as it is an industry standard format for storing 3D models and Apple has deep support for it across all its platforms, and as you might guess, is especially important for visionOS with it being a spatial interface.

It is a file format that is becoming increasingly adopted across the industry not only in Reality Composer Pro (Apple’s first party software for making 3D assets) but in notable 3rd party software such as Houdini (SideFX), and LookdevX in Maya (Autodesk) as well as support on Blender, so importing 3D assets from software your company already uses should be seamless.

It is safe to say, if you are planning on making a visionOS app that heavily uses the importing and exporting of 3D assets, you’ll want to get your head around USD files.

6. Device testing made simple: Kickstart your project in no time

visionOS lets you use your Mac whilst wearing the headset, this means that you can build and run your app straight from Xcode to your Vision Pro and test your app, which should be a nice workflow win when it eventually comes out.

Apple is releasing the SDK for visionOS later this month so you can run visionOS apps on the Xcode simulator before deploying to a real device once you're lucky enough to get a real test device.

Key user privacy considerations for visionOS

Something to consider when dreaming up how a user will interact with your app is user privacy on the platform. Firstly, you cannot monitor the user’s eye movements at all, there is no API for this whosoever. Secondly, if you want to read the gestures of the user’s hands it requires the user’s permission to do so. If they do grant permission you will be able to read where the user points at via skeletal detection with finger joints.

It depends on the use case as to whether it makes sense why you’d want to track these movements. For the most part you will have to rely on the system for handling user interaction, and your app will need to provide visual cues such as hover effects (similar to tvOS) to indicate to the user the element is interactive.

We’ll learn more soon!

This post was written prior to Apple launching the visionOS SDK so this is all based on information from WWDC’s developer talks as well as my own insights developing for Apple’s platforms over the past few years.

When the SDK does launch in a few weeks I suspect there will be a ton of new learnings and clarity on limitations or constraints.

I can’t wait to play around with the SDK when it comes to Xcode and really get to grips with the platform. Until then, I hope this post is a good starting point to help you think about what you can (and should) be building in the third dimension.

Score your team against the 8Cs

Sign up below to receive a worksheet to score your team against the 8Cs, and a guide to some smart next steps based on where you score lowest.

For information on how we use your contact data, please read our Privacy Notice.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Discover more insights