Previously, we had shown two relatively simple sample applications on how to do video chat with AR Foundation on Unity. Here are the links to the blogs:
Questions continue to come up:
“How do I share AR objects from the sender’s AR camera?”
“Also, I don’t want to share my HUD UIs, how do I do that with screen sharing?”
To address these technical challenges, we will make a more sophisticated application on the basis of knowing how to utilize AgoraIO’s Video SDK. Some of you may have already seen from this blog about a Remote Assistant App on iOS. We will make a similar App on Unity3D with AR-Foundation.
Note: This guide does not implement token authentication which is recommended for all RTE apps running in production environments. For more information about token based authentication within the Agora platform please refer to this guide: https://bit.ly/3sNiFRs
To start, we will need to integrate the Agora Video SDK for Unity3D into our project by searching for it in the Unity Asset Store or click this link to begin the download. This project should be compatible with latest SDK. In case this tutorial could not keep to date, you may fall back to the current SDK archive here.
After you finish downloading and importing the SDK into your project, you should be able to see the README files for the different platforms the SDK supports. You should be familiar with the set up for the demo by following the previous tutorials.
On UnityEditor (2019), open Package Manager from the Window tab. Install the following packages:
If you use other versions of Unity Editor please check with the README file on GitHub for the verified setups. Or you may want to try different versions, since AR Foundation is changing a lot between versions.
Please follow this link to download the completed project “RemoteAssistantAR” on GitHub.
Open the project in Unity Editor. Ignore undefined symbol errors. Go to the Unity Asset Store page to import Agora Video SDK. When importing the Agora Video SDK, unselect “demo” from the list since those file names are modified in the Github repo. Switch to iOS or Android platform and follow the README instruction on how to set up the build environment for AR Foundation.
Open up the Main scene, and your project should look like this:
You should fill your App ID into the field of the GameController object on this screen. You may then build the project for iOS or Android to run on device. You will need a mobile device to run as the broadcaster and another mobile or desktop device (include Unity Editor) to run as the audience.
The RemoteAssistantAR project consists of three scenes. Here are their descriptions:
This picture shows relationship among the scenes:
The AudPlay client and the CastAR client share the programming interface IVideoChatClient. As you can see from Figure 2, the two clients also share a common user interface in the prefab ChatCanvasUI, as depicted below:
A color controller is the additional UI controller that the AudPlay client contains. User may tap the ink droplet and get a list of color to pick. The following picture shows the construct of this prefab:
The BroadCast View Controller (or “BroadcastVC”), in most sense, is the equivalent client to the Chapter 2’s “ARClient”. You may go back to read about how to use external video source to share the video frames to the receiver. However, in this controller, we capture the AR camera image in a different way. We are actually getting the raw image data from a RenderTexture, in this method:
RenderTexture is the key to answer those questions at the beginning of this tutorial.
So far, we’ve discovered two ways to share screens. The first way, is to share the entire screen as if you are continuously making screen capture of your devices and send it. This is shown in the blog “How to Broadcast Your Screen with Unity3D and Agora.io”. The second way was discussed in my previous tutorial. The view from the AR Camera was shared without AR objects. What we want for this application is in between — we want the AR camera and AR objects (called it “AR Screen”) but not the HUD/UI elements. There could be different ways to achieve this. One way is the use of RenderTexture. In summary, it will follow these steps:
So far, there are three cameras being used in the CastAR scene. They are laid out in the following wireframe:
AR Camera: Takes the actual physical back camera from a device as input, sets each frame as output to a RenderTexture, e.g., CasterRenderTexture. Figure 6.1 illustrates the important parameters for this camera:
Render Camera: Renders 3D AR objects. One may ask, why can’t AR Camera be used for 3D object rendering? The reason being, is that the AR Camera is a physical camera. As we can see from the last tutorial, the 3D objects are not captured in this camera. Our Render Camera is placed on the same world position as the AR Camera. And its output is also being sent to the same RenderTexture — CasterRenderTexture. See figure 6.2:
View Camera: Looks at the quad, which uses a material that “reads” from the shared RenderTexture CasterRenderTexture. This camera provides the actual “view” for the user using this device. Note that since both AR Camera and Render Camera are rendering to the RenderTexture, the image is not rendered to the screen itself. That’s why this View Camera came into place. Moreover, the positions of View Camera and Quad are separated from AR Camera and Render Camera, so that the Quad object is not taken into the Render Camera’s view.
From the Hierarchy view in Figure 6.2, we may also see that View Camera parents Quad, so Quad turns and changes position with the camera’s movement. Same parent-child relationship applies to AR Camera and both View Camera and Render Camera. Their view angles and positions are kept in-sync this way.
The follow diagram illustrates the relationships between the cameras and the client views:
Next we will discuss the details about the remote assistant Part.
The idea about the remote assistant app, is that a field operator may be out to an area to work on something, but he needs some guidance on where the things to look at from a help elsewhere. Our CastAR client will be used the field operator in this scenario; and a help operator uses the AudPlay client to draw an outline of the important part on the screen. Technically, we will need to do this in the following steps:
We will use a small sphere to represent a dot, and the user’s dragging motion will create a series of small colored spheres. We will establish the model for our remote drawing data:
Unlike the iOS version of this App, we don’t need to worry about the offset of screen points for different devices. We will convert the points from the Screen Space to ViewPort Space for the transport. This page will help you to refresh your memory about the different point systems in Unity. A point from screen touch is “normalized” in this line:
Vector3 vp = Camera.main.ScreenToViewportPoint(screenPos)
We use a buffer list to accumulate the touch points to limit the number of the data stream call, which can be a bit of overhead adding to the performance measure.
On the AudPlay client side, one of the initialization steps in AudienceVC involves the creation of the data stream, in the following line:
dataStreamId = rtcEngine.CreateDataStream(reliable: true, ordered: true);
The dataStreamId is later used in the send stream message method as following:
On the CastAR client side, we dedicated a game object “DrawListener” to listen to the data stream and process the request.
The DrawListener’s controller script RemoteDrawer registers a callback to handle the data stream event:
rtcEngine.OnStreamMessage += HandleStreamMessage;
The data passing between the two ends is in character string, which is a JSON formatted string. And we invoke this method to draw the dots in the CastAR client’s 3D world:
7.3 DrawDot
Here the referenceObject is the sphere. It shares the same parent of the container of the dots to be drawn. The DeNormalizedPosition() function on line 19 does the opposite of what we did previously in Normalize():
camera.ScreenToWorldPoint(pos);
The important thing here, is to use the correct camera for the viewport space. Since the Render Camera is responsible to render the AR 3D objects, it is used for the conversion here.
Linking all the information above should provide you a better understanding of the design and the technical implementation of the RemoteAssistantAR project in Unity.
Some known issues here:
There are definitely plenty room of improvement here. And the multi camera solution for AR screen sharing is not the only valid approach. Please let me know if there is any suggestion or even better — make a Pull Request of your modification on GitHub will be the most welcomed!