The new Apple Vision Pro headset redefines personal computing and entertainment. With the capability to develop immersive applications using the Apple native SwiftUI framework or the Unity engine, the possibilities are endless.
The inclusion of FaceTime’s SharePlay feature allows seamless video chat among Apple devices within the same application. But what about users on other platforms? This is where the Agora RTC SDK comes into play, providing a solution for enabling cross-platform video chat in the Vision Pro headset. In this article, we delve into the process of implementing cross-platform video chat in a Unity application for the Vision Pro headset.
Create a Unity project with the 3D URP template.
Switch to the visionOS build setting after the project is ready. Since we will test on the simulator before deploying to the actual Vision Pro device, select Simulator SDK for the Target SDK.
Open Package Manager, and add the following packages by name:
Download the Agora Video SDK. At the time of writing, we have access only to the preview version. When the official version is released, we will update this article with the proper download location.
In Unity Editor, import the package by going to Assets > Import Package > Custom Package, and select the downloaded Unity package.
Since we will access the camera and the microphone, fill in the usage description. In addition, allow “unsafe” code because the SDK facilitates unsafe pointers.
Make sure Apple visionOS is selected for XR-Plug-in management. Provide a description for World Sensing Usage too. Optionally, provide other usage descriptions if you intend to use hand tracking.
Last, you may want to enter your company name for the app as a standard practice.
This demo project can be found at this GitHub location. The following sections discuss the detailed steps in creating the demo.
Unity’s PolySpatial package is required to create the mixed reality experience. Make sure the feature is enabled.
If the Volume Camera isn’t created automatically, you can add it by clicking the + button in the Hierarchy window and selecting XR > Setup > Volume Camera. This camera does not replace the required main camera.
Next, select XR > AR Session to add the AR Session object to the scene. Then click Volume and select Global Volume.
Optional: Create a 3D object cube named Cube, and give it the following transform property. Because this minimum demo does not have other UIs, this cube is seen as a reference of a successfully running Unity application on the VisionPro device.
Optional: To make the cube’s appearance more interesting, keep it rotating with a script like this:
public class SelfRotate : MonoBehaviour
{
void Update()
{
// Rotate the object around its local Y axis at 10 degree per second
transform.Rotate(10 * Vector3.up * Time.deltaTime);
}
}
Last, create a GameObject and name it AgoraManager. This object will host the controller script for the Agora logic that we will implement in Part 3.
The finished scene should look something like this:
The AgoraVPManager follows the design patterns that can be found in SDK API-Examples.
The class AgoraVPManager uses an instance of RTCEngine to invoke APIs that run the video chat session. The ViewObjects dictionary maintains the references to the view objects. The lifecycle of the Agora engine session is handled in the MonoBehavior lifecycle methods Start, Update and OnDestroy, respectively. The top level class makes use of an internal class UserEventHandler to handle callback events.
The AgoraVPManager script uses SerializedFields to allow developers to enter required credentials for application authentication (e.g., App ID), user token, and channel name.
#region EDITOR INPUTS
[Header(“_____________Basic Configuration_____________”)]
[FormerlySerializedAs(“APP_ID”)]
[SerializeField]
protected string _appID = “”;
[FormerlySerializedAs(“TOKEN”)]
[SerializeField]
protected string _token = “”;
[FormerlySerializedAs(“CHANNEL_NAME”)]
[SerializeField]
protected string _channelName = “”;
[SerializeField]
internal GameObject ViewContainerPrefab;
[SerializeField]
GameObject TargetObject; // to be looked at
#endregion
The following screenshot shows the inspector view of the components for AgoraManager in the Unity Editor.
In the Unity Editor, enter the App ID that you created from your Agora developer account. You should use tokens for authentication in your production code. However, for this project, it is more convenient to use an App ID without it.
As long as the application uses the same App ID and channel name, two users can communicate to each other on the different platforms. For example, a Vision Pro Unity user can talk to a user on the desktop web browser. We will use “visionpro” for our test.
This is the object that will render the video stream. We will simply use a cube from the Unity 3D object template. Just create a cube and drag it into the files panel. Name it MyCube for the prefab name. Drag MyCube to the Editor field.
We will let the view object face this transform. Use the Volume Camera for the reference.
Without further complication of UI input management, the application automatically starts the video chat session upon Start(). It follows the three required steps in the following order:
The InitEngine method creates the Agora RTC engine instance and initializes the user handler and the engine with profiles.
Currently you must use CHANNEL_PROFILE_LIVE_BROADCASTING and AUDIO_SCENARIO_GAME_STREAMING profile options in order to make the RTC feature work properly
protected virtual void InitEngine()
{
RtcEngine = Agora.Rtc.RtcEngine.CreateAgoraRtcEngine();
UserEventHandler handler = new UserEventHandler(this);
RtcEngineContext context = new RtcEngineContext(_appID, 0,
CHANNEL_PROFILE_TYPE.CHANNEL_PROFILE_LIVE_BROADCASTING,
AUDIO_SCENARIO_TYPE.AUDIO_SCENARIO_GAME_STREAMING,
AREA_CODE.AREA_CODE_GLOB);
RtcEngine.Initialize(context);
RtcEngine.InitEventHandler(handler);
}
Since we will use the Live Streaming RTC mode, we will set each client as a broadcaster in this setup. We also turn on a special tweak that gets the audio stream working correctly in visionOS.
protected virtual void SetBasicConfiguration()
{
RtcEngine.EnableAudio();
RtcEngine.EnableVideo();
VideoEncoderConfiguration config = new VideoEncoderConfiguration();
config.dimensions = new VideoDimensions(640, 360);
config.frameRate = 15;
config.bitrate = 0;
RtcEngine.SetVideoEncoderConfiguration(config);
RtcEngine.SetChannelProfile(CHANNEL_PROFILE_TYPE.CHANNEL_PROFILE_LIVE_BROADCASTING);
RtcEngine.SetClientRole(CLIENT_ROLE_TYPE.CLIENT_ROLE_BROADCASTER);// For now this private API is needed to make voice chat working
if (Application.platform == RuntimePlatform.VisionOS)
{
RtcEngine.SetParameters(“che.audio.restartWhenInterrupted”, true);
}
}
We use the simplest variation for joining the channel, omitting the customer user ID (uid) and the media options.
RtcEngine.JoinChannel(_token, _channelName);
The complete function signature:
int JoinChannel(string token, string channelId, uint uid,
ChannelMediaOptions options);
You can set the channel up without publishing your video or audio stream by using the channel media options.
It is important to call the RTC engine dispose method to properly clean up the resources when the session ends.
private void OnDestroy()
{
if (RtcEngine == null) return;
RtcEngine.InitEventHandler(null);
RtcEngine.LeaveChannel();
RtcEngine.Dispose();
}
The video display views face a designated location in this design. The camera (set for TargetObject) is used as a reference object
private void Update()
{
foreach(var ob in ViewObjects.Values) {
ob.transform.LookAt(TargetObject.transform);
}
}
The internal class UserEventHandler contains a set of handler methods to respond to Agora RTC engine events. A full list of the events is set up in the interface IRtcEngineEventHandler. Here we only focus only on the four most important events:
It is trivial to understand what OnError should handle. We will discuss only the other three.
After the local user joins the channel successfully, a reference to the RtcConnection is provided in the callback. The RtcConnection contains a UID and the channel name. Previously, we joined the channel using the default value 0. The backend system generates a unique UID to identify the user. If we joined using a designated UID, this callback will contain that same UID for response. In this implementation, we use a utility function to create the view to render the video stream of the user. For the local user, you must always use 0 instead of the actual UID to render the stream.
public override void OnJoinChannelSuccess(RtcConnection connection, int elapsed)
{
Vector3 pos = new Vector3(-2.5f, 0, 3.28f);
CreateUserView(0, connection.channelId, pos);
}
Similar to OnJoinChannelSuccess, OnUserJoined signifies the join channel event of a remote user. We add the users one by one horizontally with a gap between the rendered view transforms.
public override void OnUserJoined(RtcConnection connection, uint uid, int elapsed)
{
var count = _app.transform.childCount;
Vector3 pos = new Vector3(count * 1.5f, 0, 3.28f);
CreateUserView(uid, connection.channelId, pos);
}
When a remote user goes offline, we will remote its view from the display and clean up the reference record.
public override void OnUserOffline(RtcConnection connection, uint uid, USER_OFFLINE_REASON_TYPE reason)
{
if (_app.ViewObjects.ContainsKey(uid)) _app.ViewObjects.Remove(uid);
AgoraViewUtils.DestroyVideoView(uid);
}
The function CreateUserView calls the utility function AgoraViewUtils.MakeVideoView() to obtain the rendering view object and place it under a transform hierarchy. This is totally up to the UI/UX design. Therefore, CreateUserView is declared as a virtual function to be overridden. What is more important to discuss here is the use of the SDK-provided VideoSurface class for gathering the texture data to display in AgoraViewUtils.MakeVideoView(). AgoraViewUtils provides several rendering target options: RawImage, Plane, or a prefab that has MeshRender. We use a cube from the Unity 3D template for the rendering. The function eventually calls MakeCustomMesh.
private static VideoSurface MakeCustomMesh(string goName, GameObject prefab)
{
var go = GameObject.Instantiate(prefab);
// configure videoSurface
var videoSurface = go.AddComponent<VideoSurface>();
go.transform.Rotate(-90.0f, 0.0f, 0.0f);
return videoSurface;
}
The implementation of a simple RTC app on the Vision Pro device is now complete. You can run the application in the Unity Editor. You should see the webcam capture of yourself to the left of the reference cube.
Recall that we set up the target SDK to simulator in Part1. Build the Unity project to obtain the Xcode project. Then build the Xcode project.
If Xcode shows a long list of duplicate symbols error, it is because of the extra target included for the simulator build. Remove the ARM64 framework from the Xcode project.
Build and play the application in the Vision Pro simulator. Open a web browser and go to the Agora WebDemo page. Enter your App ID and channel name for the test.
When the Vision Pro app successfully runs in the simulator, you should see something like this screenshot. Because there is no camera capture for Vision Pro in a simulator setting, you will see an empty rectangular box at the left that is set for the local user’s display.
Switch the target SDK to the device SDK and build the Unity project again.
In Xcode, build and deploy the application to the Vision Pro device.
You should see both the local user (your persona) and the remote user in the RTC video chat!