This blog was written by Shriya Ramakrishnan, an Agora Superstar. The Agora Superstar program empowers developers around the world to share their passion and technical expertise, and create innovative real-time communications apps and projects using Agora’s customizable SDKs.
Real-Time Communication (RTC), by definition, is a simultaneous information exchange process between point A to point B with negligible latency. In the RTC industry, WebRTC is definitely a pioneer technology.
WebRTC (Web Real-Time Communication) is an open-source protocol pioneered by Google for in-browser RTC. Later, it went on to be standardized as a part of the browser spec by the World Wide Web Consortium (W3C). As the name goes, it was created as a real-time communication tool for one to one video/audio calling or transmission of any kind of data over the web. Many technology experts even boldly predicted when it was first released that WebRTC would become a breakthrough in video communication technology. WebRTC gradually gained more popularity and fame over the years. If you are seeing apps using web browsers to make video calls, chances are they use WebRTC to power them. WebRTC unlocks the possibility that people can create any video chat app they like. The freedom that WebRTC grants makes the technology highly competitive against traditional video chat apps like Skype.
Usually, WebRTC provides scenarios for people to play around video chat features. For example, integrating WebRTC on a contact sales page, rather than asking customers to reach out via email or phone, instantly connects sales representatives with customers through video calling on that page, increasing the customer engagement and decreasing the churn rate.
But today, WebRTC has grown to much more than just video or audio calling. It is presently used for a plethora of use cases, including but not limited to:
In today’s world, communication has been made possible using the internet. Earlier we used to use IPV4 (Internet Protocol Version 4) for every device which only has a 32-bit address space. As the number of devices connected to the internet grew, exhausting this address space, we switched to a much larger 128-bit address space with IPV6. But this puts an excessive overhead of packets carrying the real-time data (128 bits instead of 32 bits). This gives rise to NATs (Network Address Translation). NATs provide a single external IPV4 address to the nodes inside a local network that uses IPV6. This method allows much of the communication to take place via IPV4 whereas the nodes themselves use IPV6, making it a win-win.
This also provides a layer of security to the nodes in the local network from unknown nodes outside the network. Only an authorized node that wishes to communicate is given the internal IP address maintained in what is called the ‘NAT table’.
STUN (Session Traversal Utilities for NAT) servers are designed to perform a lookup on the NAT table and translate the IPV4 to IPV6 and vice versa. The STUN server helps obtain the IP: port (external/public address) for communication with a peer. After obtaining the IP: port with the help of a STUN server, the address is sent to the peer using signaling and a connection is set up using ICE (Interactive Connectivity Establishment) Negotiations.
RTCPeerConnection is an interface in WebRTC that represents a connection between the local computer and a remote peer. This connection by default sets up a connection over UDP. If that fails, it is done over TCP and as a last resort TURN (Traversal Using Relays around NAT) servers are used to relay network traffic.
ICE is used for finding the best path for transmission between peers. The peers exchange information about the network connection (UDP[preferable]/TCP/ TURN server). These information exchanges are called ICE candidates.
Signaling is the process of discovering potential peers. ICE Negotiation is done after signaling when the potential peer has been found. WebRTC is made secure by a few procedures:
The first step is to get the local stream by obtaining permission for MediaStream using MediaDevices
.getUserMedia().
MediaDevices.getUserMedia() :
This method prompts the user for access to devices such as the webcam, microphone, and permission to carry out screen-sharing. The user selects the devices he wishes to grant access to using ‘allow’ or ‘decline’ for audio, video, and screen-sharing.
The function returns a Promise
that resolves to a MediaStream
object. If the user denies permission, or matching media is not available, then the promise is rejected with NotAllowedError
or NotFoundError
respectively.
The above code written within a function called start() displays the local stream on clicking the start button to start the call.
The next thing to do after obtaining the local stream is to connect to a suitable peer (found by signaling and connected to by ICE negotiation).
An interface is set up between the local computer and the remote peer known as the RTCPeerConnection.
To connect to the peer we click on the call button. The call button triggers the ‘call()’ function. This function carries out the following process (represented diagrammatically) after which it obtains the remote stream and displays it to the local user and displays the local stream to the remote user.
The overall contribution of RTCPeerConnection looks like this (src:http://webrtc.github.io/webrtc-org/architecture/):
The RTCPeerConnection has several functions such as:
Finally, after the call, to terminate it, we simply close the RTCPeerConnection using the .close() function on the connection object.
Which brings us to the end of the sample app!
For the full code click here!
As shown in the above diagram, the bandwidth consumed while using Agora is much lower comparatively as the number of participants scale up.
While in vanilla WebRTC, egress scales up with additional participants (n-1), it remains constant when using Agora. In other words, you publish your video only once and not once per every participant in the video call. This ensures that you have extra bandwidth to accommodate additional users.
The first step to building a video call application is to create and configure a client. ‘Configure’ here means to define whether the video is a ‘live’ broadcast where the host sends and receives voice/video, while the audience can only receive voice/video, or set to ‘RTC’ to signify that the client is set to communication and this is typically used in one-to-one calls or group calls where all the users in the channel can talk freely. The codec stands for encoding-decoding and is a software used for compression and decompression of a digital media file. We specify the codec standard under ‘codec’. Here we are using the ‘h.264’ standard which is a highly efficient standard for media files.
We now create a function named ‘myfunction()’ which will be called by the ‘join’ button and is put into action on clicking the ‘join’ button on our HTML webpage.
This function contains the major portion of the execution.
‘handlefail’ — this variable has been defined with a function whose sole purpose is to log an error in the console every time that an error arises.
‘remotecontainer’, ‘appid’ are variables that are created to manipulate the element with ids ‘remote’ and ‘appid’ respectively.
‘addremotestream’ this function is used to add a new child div tag to the parent remotecontainer div tag and display the video of every new user in the channel on your webpage.
For one’s video to be published into the stream, the client object is first initialized using the ‘.init’ function. We pass the App ID as the unique identifier for every client and a logger function to track the activity and flow of the program in the console, as parameters to this function.
After the client has been initialized with an App ID and the channel name, the ‘.join’ function is used to add the client to a channel as specified as one of the parameters of the function along with the logger function.
Now, the local stream (‘localstream’ variable holds the configuration’) is created using the ‘.createStream’ function. This function is used to indicate the need of audio, video, screen(-sharing) and passes on a unique id for the stream to be identified.
Following the creation of the local stream, it is initialized by a function which logs the activity of the initialization in the console and a ‘.play’ function to play video captured from the webcam of the local user in the local stream in the tag specified as an argument passed to the ‘.play’ function.
The client then publishes, i.e. shares, his local video to the rest of the channel using the ‘.publish’ with the local stream and a logger function (‘handlefail’) as parameters.
Now that the local stream (our stream) is created and published, how do we see the video of the other users in the channel?
For this we have enabled event listeners.
The ‘stream-added’ is an Agora defined event. Every time another user creates and publishes his stream into the channel, this event is triggered and goes from passive to active.
The ‘.on’ function listens to the change of an event (active to passive) that is passed as its argument and informs the object (client object here) that it is called by. Along with that, it takes another argument i.e. a function here that ‘subscribes’ to the new stream in the channel using the ‘.subscribe’ function.
The next event listener listens to the event ‘stream-subscribed’ and does the job of adding the new stream to our webpage and playing the video published by the remote user for the local user.
‘removeVideoStream’- this function as the name goes is meant to remove any video stream from the webpage and stop the viewing of that stream.
We again use another pair of event listeners to do this.
The first listener listens to the event ‘stream-removed’ this event becomes active when the remote stream is removed; for example, a peer user calls Client.unpublish to remove his stream from the channel. The listener then enforces the removeVideoStream.
The second event listener listens to the event ‘peer-leave’ this occurs when the peer user leaves the channel; for example, the peer user calls Client.leave for the client to leave the channel. The listener then enforces the removeVideoStream.
The number of lines of code for the same functionality is lesser using Agora rather than using just webRTC. This brings us to the conclusion that Agora carries out the same task with greater ease and with lesser tasks for the user to perform in terms of hardware maintenance and software implementation as afore-mentioned.
For the full code, click here.
By now you should have a comprehensive knowledge about WebRTC and how this pioneering technology has opened up the possibilities of data and media transfer over the public internet. As with many open-source projects, there are known limitations, especially for a WebRTC novice. The Agora platform offers a cost-effective alternative solution with proprietary networks, and reliable encoding/decoding technologies, professional enterprise-level technical support, and dedicated teams that you can trust for your project or application.
More importantly, it is FREE to start. You are guaranteed to receive 10,000 minutes of free EVERY MONTH.