The latest innovation in virtual reality technology brings us a more realistic experience to meet and talk to people in an immersive space through VR headsets. Technology for real-time engagement enables video chat and voice chat in the VR world. Many VR headset makers give consideration to making spatial audio available for environmental sounds. However, the audio from a 3D source (e.g., your peer in a video chat) does not always get the same treatment.
In this tutorial, we are going to walk through building a VR chat application that enables spatial sound from the speakers. It is important to understand that the members of the chat group can come from other platforms, like web or Unity Desktop apps, in addition to a VR headset. For this project, we will demonstrate our implementation on an Oculus Quest headset. The same underlying technology should apply to other compatible headsets.
This project consists of four parts. The first part will walk through how to set up the project, integrating the Oculus packages along with the Agora SDK and sample API scripts.
The second part will walk through creating the scene, including setting up the XR Rig with controllers, creating the UI, and implementing the Agora API hooks.
The third part will show how to use another sample Unity mobile app to test the video streaming between VR and non-VR users.
The last part will be a discussion of the technology used for the audio frame and audio source handling. It is for programmers who want to better understand the implementation. Reading this section is not required for getting this tutorial to run.
The link to the complete project or package can be found in the last section of this blog.
The Get Started page on the Oculus website provides detailed steps on setting up a Unity project for VR. We will go over how the current project was set up using Unity 2019. For versions 2020 and 2021, minor package import steps may vary.
To begin, open a new project targeting the Android platform.
Next, import the Oculus Integration package from the Asset Store. This will take a bit of time:
After the import, Unity may ask if you want to update to a newer version of the plug-in. This may seem a little strange because the latest has been downloaded. You may choose either Yes or No and continue. Unity will restart and finish the import and compilation:
Now, modify the Build Settings as follows:
Get the Agora SDK from the Asset Store and import everything:
If this is the first time you’ve used Agora, you should try out the demo app from the Assets folder on the Unity Editor before moving on to the VR project. Check the accompanying README file from the SDK for helpful information about the demo and the SDK. You should be able to run the SDK demo in no time.
You will need an App ID for running Agora applications. Head to your Agora developer console, create a test project, and get the App ID:
For simplicity in this sample project, we will omit the step for token generation. But for better security, you should use an App ID with tokens enabled in your production application.
Agora provides a collection of small sample projects that shows the APIs examples for common interests. From the repo, copy the PlaybackAudioFrame folder and its dependent scripts from tools:
For your convenience, the files are put into this package archive so you don’t have to handpick them from the API-Example. You project folder hierarchy should now look like the following screenshot:
We will create a new scene by reusing a sample scene from one of the Oculus Integration packages. Find the RedBallGreenBall scene from the Project navigator and clone this file. Rename the file AgoraSpatialTest and move it to the Assets/Scenes folder.
In the scene, a FirstPersonController prefab was used, as shown above. Remove this object, search and find the OVRPlayerContoller prefab from the project, and drag and drop it to the AgoraSpatialTest scene. The FirstPersonController only allows you to see the scene’s objects fixed to the view positions, i.e., the balls move when you turn your head. In contrast, the OVRPlayerController allows you to see the true VR space, and the objects stay at that initial position while your head is turning.
CheckPoint: Now you can build and run the project and experience the spatial audio that is produced by the audio source from the red ball and the green ball. As you turn your head with the Oculus headset, you should hear the sound come from the direction of the individual balls.
This screenshot summarizes the actions:
Awesome! Don’t forget to save the scene – this is always important! We are now ready to do a test.
The Oculus headset can now connect to other users via the integrated Agora Video SDK. Since the headset doesn’t produce a video stream in the example, we would like to get the video stream from a non-VR headset source. There are several ways to provide that:
Choice 1: Use the SDK demo app. From the sample project, we can open the demo app from the AgoraEngine folder and run it either from the Editor or a build to a device. Place the device next to a TV or radio for some random audio input.
Choice 2: Use Agora Web demo app. This one can be a good setup with a colleague who can help out the test from a remote location.
Choice 3: Use this demo app for a looping sound sample. The code is from the same repo where we downloaded the scripts earlier. I like this choice because it works well for a solo tester, and it has constant sound samples to validate in the test as expected.
We can’t show the same spatial audio experience to a VR headset in this tutorial. However, we can share what is being presented in the view of the Oculus Quest. In the test, the helper app is running from a mobile phone, which is placed in front of the VR tester. As a result, a Capsule was spawned near the red ball. The sound of the sample music comes from the back-right position when the test starts. The direction of the sound changes as the tester turns their head to look at the Capsule.
Consideration: The original sound from the green ball, which is vocal1 in the Oculus sample resource, is a bit too loud compared to the testing music sample. Try turning its volume down in the AudioSource component to 0.5 or less.
If there are enough resources, more non-VR remote users can be added to the test. Each of the users will be represented by a Capsule instance in the VR scene. The VR tester can walk around, get closer to any other user in the virtual space, and listen to the amplified sound of that remote audio stream!
That’s so easy for a video chat experience with spatial audio in a VR environment, isn’t it? And just by reusing the sample projects, we haven’t written a single line of code.
This completes the tutorial on how to do video chat with spatial audio on Oculus. But if you want to learn more about the technology that makes this possible, read on.
To understand how this project works, let’s take a look at the building blocks of this project by looking at the key APIs, data structure, and algorithm. Since quickstart tutorials on Oculus are already available, we will focus on how we use the Agora SDK in this project.
You can find the following scripts in the user-audioframe-audiochannel folder:
Here, UserAudioFrame2SourceSample is a controller that sets up an Agora RTC engine, registers event callbacks, and manages users as objects. The setup of the engine is pretty straightforward. If you are new to Agora, see this guide for a quickstart tutorial.
This is the most important API that separates individual audio stream:
_audioRawDataManager.SetOnPlaybackAudioFrameBeforeMixingCallback(OnPlaybackAudioFrameBeforeMixingHandler);
The descriptive method name suggests the following implications:
For the third implication, it is important to tell the engine that you want to turn off the normal mixed audio output and let you take control of what is being played. Here is the second important API call:
mRtcEngine.SetParameter("che.audio.external_render", true);
In Unity, a dynamic component attachment or UI update must be done on the main thread. Playing an audio clip on an AudioSource object is considered an UI update. But the OnPlaybackAudioFrameBeforeMixingHandler callback runs on a background thread. So how do we deal with that? The answer is MainThread Dispatcher. By using the BlockingCollection data structure, we have thread-safe data struct that can queue up the actions and let them run by the Unity main thread, accessed in the Update() function. To send actions to the queue, call the dispatch method like this:
dispatch(()=> { <code statement 1> ; <code statement 2>; <etc.> });
The dispatch method takes a C# System Action as parameter, which is also an object, and adds to the BlockingCollection. Very simple.
This handler class runs individually on the Capsule object that is spawned when a remote user joins. The handler converts the audio frames that came from the Agora engine into audio clip data and plays them on the AudioSource component. Of course, we also need the Oculus SDK’s script ONSPAudioSource to extend that capability to spatial sound.
The properties of the AudioSource component are filled by the information passed from the first audio frame packet. One technical challenge here is making the smooth playback on an interval based audio frames collection on the AudioSource. It is a classical Consumer-Producer concurrency model. The answer to that is the use of a Ring Buffer data structure, thanks to Joe Osborn, the GitHub contributor who implemented the code. In the sample code, we allocate about 10 seconds of audio data according to the audio frame rate and channels. Basically, the user audio frame callback function (discussed in the previous section) acts as the producer, and the AudioSource’s OnAudioRead function acts as the consumer. If there is enough data, AudioSource will consume the buffered audio from this data structure.
The following diagram illustrates this architecture:
NullReferenceException: Object reference not set to an instance of an object
OculusSampleFrameworkUtil.HandlePlayModeState (UnityEditor.PlayModeStateChange state) (at Assets/Oculus/SampleFramework/Editor/OculusSampleFrameworkUtil.cs:45)
UnityEditor.EditorApplication.Internal_PlayModeStateChanged (UnityEditor.PlayModeStateChange state) (at /Users/bokken/buildslave/unity/build/Editor/Mono/EditorApplication.cs:415)
In this case, you can update the OculusSampleFrameworkUtil.cs script with the following code snippet:
private static void HandlePlayModeState(PlayModeStateChange state)
{
if (state == PlayModeStateChange.EnteredPlayMode)
{
System.Version v0 = new System.Version(0, 0, 0);
UnityEngine.Debug.Log("V0=" + v0.ToString());
#if UNITY_EDITOR
OVRPlugin.SendEvent("load", v0.ToString(), "sample_framework");
#else
OVRPlugin.SendEvent("load", OVRPlugin.wrapperVersion.ToString(), "sample_framework");
#endif
}
}
The tutorial should work best on the Oculus Quest. And since we didn’t write a line of code for this project, there won’t be a separate code sample to maintain in a GitHub repo. However, the scene and scripts in the dependency are archived in a custom package, so you can just import the package into a Oculus project to try it out. See this repo link for the package.
If you are developing for other VR headsets, the technology we described in part 4 should be universal, and you can leverage APIs on the other headset. Best of luck on your immersive journey!
For more information about the Agora Video SDK, see the Agora Unity SDK API Reference. For more information about the Oculus Integration Unity SDK, see the Oculus App Development page.
For technical support with Agora, I invite you to join the Agora Developer Slack community.