Conference protocol

This document aims to describe the evolutions we will do for managing conferences (audio/video). The goal is to improve the current implementation which simply merges SIP calls and provide a grid view, to a view where participants are listed, can be muted independently, or the video layout changed (to show only one participant)

Definitions

Host: Is the user who mix the audio/video streams for the others
Participant: Every user in the conference, even the host

Disclaimer

This document only describes the first steps for now. This means the identification of participants and position in the video mixer sent to all participants.

Possible layouts

GRID: Every member is shown with the same height/width
ONE_BIG_WITH_SMALL: One member is zoomed and the other preview is shown
ONE_BIG: One member take the full screen rendered

Two new methods are available to manage the conference Layout in CallManager:

/**
 * Change the conference layout
 * @param confId
 * @param layout    0 = matrix, 1 = one big, others in small, 2 = one in big
 */
void setConferenceLayout(const std::string& confId, int layout);

/**
 * Change the active participant (used in layout != matrix)
 * @param confId
 * @param participantId    If participantId not found, the local video will be shown
 */
void setActiveParticipant(const std::string& confId, const std::string& participantId);

Implementation

The implementation is pretty straightforward. Everything is managed by conference.cpp (to link participant to sources) and video_mixer.cpp (to render the wanted layout).

Syncing Conferences Informations

Note: Actually, the word participant is used for callId mixed in a conference. This can lead at first to some problems for the API and must be fixed in the future

The goal is to notify all participants of the metadata of the rendered video. This means what participant is in the conference and where the video is located.

If a participant is itself a conference, its incoming layout info should be merged when sent to other participants. Layout info must not be merged when sent back to a conference.

Layout Info

The Layout is stored as a VectorMapStringString for clients and internally with a vector with the following format:

Layout = {
    {
        "uri": "participant", "x":"0", "y":"0", "w": "0", "h": "0", "isModerator": "true"
    },
    {
        "uri": "participant1", "x":"0", "y":"0", "w": "0", "h": "0", "isModerator": "false"
    }
    (...)
}

Possible keys are:

uri = account’s uri
device = device’s id
media = media’s id
active = if the participant is active
x = position (x) in the video
y = position (y) in the video
w = size (width) in the video
h = size (height) in the video
videoMuted = if the video is muted
audioLocalMuted = if the audio is locally muted
audioModeratorMuted = if the audio is muted by moderators
isModerator = if it’s a moderator
handRaised = if the hand is raised
voiceActivity = if the stream has voice activity
recording = if the peer is recording the conference

New API

A new method (in CallManager) and a new signal to respectively get current conference infos and updates are available:

VectorMapStringString getConferenceInfos(const std::string& confId);

void onConferenceInfosUpdated(const std::string& confId, const VectorMapStringString& infos);

Implementation

The Conference Object (which only exists if we mix calls, this means that we are the master) manages the information for the whole conference, based on the LayoutInfos of each Call object. The getConferenceInfos will retrieve info directly from this object.

So, every Call object now has a LayoutInfo and if updated, ask the Conference object to updates its info.

The master of a conference sends its info via the SIP channel as a message with the following MIME type: application/confInfo+json

So, if a call receives some confInfo, we know that this call is a member of a conference.

To summarize, Call manages received layouts, Conference-managed sent layouts.

Changing the state of the conference

To change the state of the conference, participants needs to send orders that the host will handle.

The protocol have the following needs:

It should handle orders at multiple levels. In fact for a conference the is 3 levels to define a participant:

The account which is the identity of the participant
Devices, because each account can join via multiple devices
Medias, because there can be multiple videos by devices (eg 1 camera and 1 screen sharing)

To save bandwidth, clients should be able to send multiple orders at once.

General actions

To change a layout, the moderator can send a payload with “application/confOrder+json” as type: where 0 is a grid, 1 is one user in big, others in small, 2 is one in big

Account’s actions

For now, there is no action supported, however, in the future moderator: true/false should be handled to change a moderator.

Device’s actions

hangup: true to hangup a device from the conference (only moderators)
raisehand: true/false to change the raise hand’s status. Only doable by the device itself, else dropped.

Media’s actions

muteAudio only doable by moderators to mute the audio of a participant
muteVideo not supported yet.
active to mark the media as active.
voiceActivity to indiciate a media stream’s voice activity status (only relevant for audio)

Example

So, the application/confOrder+json will contains:

{
    "989587609427420" : {
        "moderator": true/false,
        "devices": {
            "40940943604396R64363": {
                "hangup": true,
                "raisehand": true/false,
                "media":{
                    "3532532662432" : {
                        "muteAudio": true/false,
                        "muteVideo": true/false,
                        "active": true/false,
                        "voiceActivity": true/false
                    }
                }
            }
        }
    },
    "layout": 0/1/2,
}

Note: the type of the media should be included in conferences informations and can be used for the client to improve display (e.g. do not crop screen sharing)

Controlling moderators

There is actually 3 possibilities:

Changing account’s config to add a list of moderators (In the config.yml (defaultModerators can contains a list of default moderators)
If localModeratorsEnabled is true, all accounts of the device will be moderators
If allModeratorsEnabled is true, anybody in the conference will be a moderator

Future

Separate streams to allow more controls?

Notes/Comments

It’s likely that the protocol will evolve for future needs. I believe it’s best if we have a “version” field. The older version will be recognized if this field is missing.