Appels

Important

This page details the principle for Jami accounts. For SIP accounts, the SIP protocol is used.

Faisons un appel à Jami!

Le côté du Daemon

When creating a call between two peers, Jami mainly uses known protocols such as ICE, SIP, or TLS. However, to make it distributed, the process of creating a call is a bit different. To summarize, when someone wants to contact one of their contacts, this is what they will do:

Search the contact presence on the DHT (for more details, see Gestion des contacts).
Once the contact is found, send a call request, announcing the known candidates (the IP address of each network interface + relay addresses (TURN) + reflexive addresses (UPnP, public ones).
Wait for the response of the contact (they will respond to their known addresses).
En fait, deux sessions ICE sont négociées. Une (privé) en TCP, une en UDP (en tant que rétroaction).
Ensuite, la prise est cryptée en TLS (si TCP) ou DTLS (si UDP).
The contact is now able to accept or decline the call. When they accept, an ICE transport (UDP only for now) is negotiated to create 4 new sockets for the media (2 for audio, 2 for video).
L’appel est maintenant vivant!

Échange de candidats à l’ICE

Everything really starts in jamiaccount.cpp (JamiAccount::startOutgoingCall). Once both ICE objects are ready and when the contact is found via the DHT, the call request for the contact is crafted. This request contains all the information necessary for the remote ICE session defined by:

dht::IceCandidates(callvid,  blob)

where:

callvid is a random number used to identify the call, and
blob contains two concatenated ICE messages (IceTransport::packIceMsg in ice_transport.cpp) containing the password of the session, the ufrag, and ICE candidates like:

0d04b935
7c33834e7cf944bf0e367b42
H6e6ca382 1 UDP 2130706431 2607:fad8:4:6:9eb6:d0ff:dead:c0de 14133 typ host
H42c1g477 1 UDP 2130706431 fe80::9eb6:d0ff:fee7:1412 14133 typ host
Hc0a8027e 1 UDP 2130706431 192.168.0.123 34567 typ host
Sc0a8027e 1 UDP 1694498815 X.X.X.X 32589 typ srflx
0d04b932
7c33834e7cf944bf0e367b47
H6e6ca682 1 TCP 2130706431 2607:fad8:4:6:9eb6:d0ff:dead:c0de 50693 typ host tcptype passive
H6e6ca682 1 TCP 2130706431 2607:fad8:4:6:9eb6:d0ff:dead:c0de 9 typ host tcptype active
H42c1b577 1 TCP 2130706431 fe80::9eb6:d0ff:fee7:1412 50693 typ host tcptype passive
H42c1b577 1 TCP 2130706431 fe80::9eb6:d0ff:fee7:1412 9 typ host tcptype active
Hc0a8007e 1 TCP 2130706431 192.168.0.123 42751 typ host tcptype passive
Hc0a8007e 1 TCP 2130706431 192.168.0.123 9 typ host tcptype active
Sc0a8007e 1 TCP 1694498815 X.X.X.X 42751 typ srflx tcptype passive

and is sent via the DHT in an encrypted message for the device to hash(callto:xxxxxx) where xxxxxx is the device ID. The peer will answer at the exact same place (but encrypted for the sender device) its own dht::IceCandidates. See JamiAccount::replyToIncomingIceMsg for more details.

The ICE session is created on both sides when they have all the candidates (so for the sender, when the reply from the contact is received).

Les négociations ICE

Pending calls are managed by JamiAccount::handlePendingCallList(), which first wait for the TCP negotiation to finish (and if it fails, wait for the UDP one). The code for the ICE negotiation is mainly managed by pjproject but for Jami, the interesting part is located in ice_transport.cpp. Moreover, we add some important patches/features on top of pjproject not merged upstream for now (for example, ICE over TCP). These patches are present in contrib/src/pjproject.

Chiffrer la socket de contrôle

Once the socket is created and managed by an IceTransport instance, it is then wrapped in a SipTransport corresponding to a TlsIceTransport. The main code is located in JamiAccount::handlePendingCall() and the wrapping is done in SipTransportBroker::getTlsIceTransport. Finally, our session is managed by TlsSession in daemon/src/security/tls_session.cpp and uses the GnuTLS library.

So, the control socket will be a TLS (1.3 if you and your peer’s GnuTLS version supports it) if a TCP socket is negotiated. If a UDP socket is negotiated instead (due to firewall restrictions/problems in the negotiation/etc.), the socket will use DTLS (still managed by the same parts).

The control socket is used to transmit SIP packets, like invites, custom messages (Jami sends the vCard of your profile on this socket at the start of the call, or the rotation of the camera), and text messages.

Sockets multimédia

Media sockets are SRTP sockets where the key is negotiated through the TLS session previously created.

Avertissement

TODO: This section is incomplete.

Architecture

Avertissement

TODO: This section is incomplete.

Multi-stream

Since daemon version 13.3.0, multi-stream is fully supported. This feature allows users to share multiple videos during a call at the same time. In the following parts, we will describe all related changes.

PJSIP

The first part is to negotiate enough media streams. In fact, every media stream uses 2 UDP sockets. We consider three scenarios:

If it’s the host of a conference who wants to add media, there is nothing more to negotiate, because we already mix the videos into one stream. So, we add the new media directly to the video mixer without negotiations.
Si nous sommes à 1:1, pour l’instant, comme il n’y a pas d’informations de conférence, multi-stream n’est pas pris en charge.
Sinon, deux nouvelles prises sont négociées pour de nouveaux médias.

To make PJSIP able to generate more sockets per ICE session, PJ_ICE_COMP_BITS was modified to \(5\) (which corresponds to \(2^5\), so \(32\) streams).

Déprécier switchInput, prendre en charge requestMediaChange

Dans le daemon, l’ancienne API switchInput est maintenant DEPRECATED; la même pour switchSecondaryInput:

<method name="switchInput" tp:name-for-bindings="switchInput">
    <tp:docstring>
        Switch input for the specified call
    </tp:docstring>
    <arg type="s" name="accountId" direction="in"/>
    <arg type="s" name="callId" direction="in"/>
    <arg type="s" name="input" direction="in"/>
    <arg type="b" direction="out" />
</method>

<method name="switchSecondaryInput" tp:name-for-bindings="switchSecondaryInput">
    <tp:added version="11.0.0"/>
    <tp:docstring>
        Switch secondary input for the specified conference
    </tp:docstring>
    <arg type="s" name="accountId" direction="in" />
    <arg type="s" name="conferenceId" direction="in"/>
    <arg type="s" name="input" direction="in"/>
    <arg type="b" direction="out" />
</method>

requestMediaChange remplace ce terme, tant pour les appels que pour les conférences:

<method name="requestMediaChange" tp:name-for-bindings="requestMediaChange">
    <tp:added version="11.0.0"/>
    <tp:docstring>
        <p>Request changes in the media of the specified call.</p>
    </tp:docstring>
    <arg type="s" name="accountId" direction="in" />
    <arg type="s" name="callId" direction="in">
        <tp:docstring>
        The ID of the call.
        </tp:docstring>
    </arg>
    <annotation name="org.qtproject.QtDBus.QtTypeName.In2" value="VectorMapStringString"/>
    <arg type="aa{ss}" name="mediaList" direction="in">
        <tp:docstring>
        A list of media attributes to apply.
        </tp:docstring>
    </arg>
    <arg type="b" name="requestMediaChangeSucceeded" direction="out"/>
</method>

Compatibility

If a call is done with a peer where the daemon’s version is < 13.3.0, multi-stream is not enabled, and the old behavior is used (1 video only).

Stream identification

Parce qu’il peut y avoir plusieurs flux maintenant, chaque flux multimédia est identifié par son identifiant, et le format est « _ »; par exemple: « audio_0 », « video_2 », etc.

Retour

Le XML a été mis à jour pour ajouter le flux recherché:

<?xml version="1.0" encoding="utf-8" ?>
<media_control>
  <vc_primitive>
    <stream_id>{}</stream_id>
    <to_encoder>
      <device_orientation>0</device_orientation>
    </to_encoder>
  </vc_primitive>
</media_control>

Le cadre de la clé

Le XML a été mis à jour pour ajouter le flux recherché:

<?xml version="1.0" encoding="utf-8" ?>
<media_control>
  <vc_primitive>
    <stream_id>{}</stream_id>
    <to_encoder><picture_fast_update/></to_encoder>
  </vc_primitive>
</media_control>

Activité vocale

The XML was updated to add the required stream:

<?xml version="1.0" encoding="utf-8" ?>
<media_control>
  <vc_primitive>
    <stream_id>{}</stream_id>
    <to_encoder>
      <voice_activity>true</voice_activity>
    </to_encoder>
  </vc_primitive>
</media_control>

Conférence

Reflected changes are documented here.

Client

Even if the back-end supports up to 32 media at the same time, except for custom clients, we currently recommend only giving the ability to share one camera and one video at the same time. The camera is controlled via the camera button, and the other media via the « Share » button.

In client-qt, the interesting part is in AvAdapter (methods like isCapturing, shareAllScreens, stopSharing). In the library’s logic, addMedia and removeMedia in the callModel directly use the requestMediaChange, and can be used as a design reference.