Swarm
Synospis
The goal of this document is to describe how group chats (a.k.a. swarm chat) will be implemented in Jami.
A swarm is a group able to discuss without any central authority in a resilient way. Indeed, if two person doesn’t have any connectivity with the rest of the group (ie Internet outage) but they can contact each other (in a LAN for example or in a subnetwork), they will be able to send messages to each other and then, will be able to sync with the rest of the group when it’s possible.
So, the swarm is defined by:
Ability to split and merge following the connectivity.
Syncing of the history. Anyone must be able to send a message to the whole group.
No central authority. Can not rely on any server.
Non-repudiation. Devices must be able to verify old messages’ validity and to replay the whole history.
PFS on the transport. Storage is managed by the device.
The main idea is to get a synchronized Merkle tree with the participants.
We identified four modes for swarm chat that we want to implement:
ONE_TO_ONE, basically the case we have today when you discuss to a friend
ADMIN_INVITES_ONLY generally a class where the teacher can invite people, but not students
INVITES_ONLY a private group of friends
PUBLIC basically an opened forum
Scenarios
Create a Swarm
Bob wants to create a new swarm
Bob creates a local git repository.
Then, he creates an initial signed commit with the following:
His public key in
/admins
His device certificate in ̀ /devices`
His CRL in ̀ /crls`
The hash of the first commit becomes the ID of the conversation
Bob announces to his other devices that he creates a new conversation. This is done via an invite to join the swarm sent through the DHT to other devices linked to that account.
Adding someone
Alice adds Bob
Alice adds Bob to the repo:
Adds the invited URI in
/invited
Adds the CRL into
/crls
Alice sends a request on the DHT
Receiving an invite
Alice gets the invite to join the previously create swarm
She accepts the invite (if decline, do nothing, it will just stay into invited and Alice will never receive any message)
A peer-to-peer connection between Alice and Bob is done.
Alice pull the git repo of Bob. WARNING this means that messages need a connection, not from the DHT like today
Alice validates commits from Bob
To validate that Alice is a member, she removes the invite from
/invited
directory, then adds her certificate into the/members
directoryOnce all commits validated and on her device, other members of the group are discovered by Alice. with these peers, she will construct the DRT (explained below) with Bob as a bootstrap.
Sending a message
Alice sends a message
Sending a message is pretty simple. Alice writes a commit-message in the following format:
{
"type": "text/plain",
"body": "coucou"
}
and adds her device and CRL to the repository if missing (others must be able to verify the commit). Merge conflicts are avoided because we are mostly based on commit messages, not files (unless CRLS + certificates but they are located). then she announces the new commit via the DRT with a service message (explained later) and pings the DHT for mobile devices (they must receive a push notification).
For pinging other devices, the sender sends to other members a SIP message with mimetype = “application/im-gitmessage-id” containing a JSON with the “deviceId” which sends the message, the “id” of the conversation related, and the “commit”
Receiving a message
Bob receives the message from Alice
Bob do a git pull on Alice
Commits MUST be verified via a hook
If all commits are valid, commits are stored and displayed. Then Bob announces the message via the DRT for other devices.
If all commits are not valid, pull is canceled. Alice must reestablish her state to a correct state.
Validating a commit
To avoid users pushing some unwanted commits (with conflicts, false messages, etc), this is how each commit (from the oldest to the newest one) MUST be validated before merging a remote branch:
Note: if the validation fails, the fetch is ignored and we do not merge the branch (and remove the data), and the user should be notified Note2: If a fetch is too big, it’s not merged
For each commits, check that the device that tries to send the commit is authorized at this moment and that the certificates are present (in /devices for the device, and in /members or /admins for the issuer).
3 cases. The commit has 2 parents, so it’s a merge, nothing more to validate here
The commit has 0 parents, it’s the initial commit:
Check that admin cert is added
Check that device cert is added
Check CRLs added
Check that no other file is added
The commit has 1 parent, commit message is a JSON with a type:
If text (or other mime-type that doesn’t change files)
Check signature from certificate in the repo
Check that no weird file is added outside device cert nor removed
If vote
Check that voteType is supported (ban, unban)
Check that vote is for the user that signs the commit
Check that vote is from an admin and device present & not banned
Check that no weird file is added nor removed
If member
If adds
Check that the commit is correctly signed
Check that certificate is added in /invited
Check that no weird file is added nor removed
If ONE_TO_ONE, check that we only have one admin, one member
If ADMIN_INVITES_ONLY, check that invite is from an admin
If joins
Check that the commit is correctly signed
Check that device is added
Check that invitation is moved to members
Check that no weird file is added nor removed
If banned
Check that vote is valid
Check that the user is ban via an admin
Check that member or device certificate is moved to banned/
Check that only files related to the vote are removed
Check that no weird file is added nor removed
else fail. Notify the user that they may be with an old version or that peer tried to submit unwanted commits
Ban a device
Alice, Bob, Carla, Denys are in a swarm. Alice bans Denys
This is one of the most difficult scenarios in our context. Without central authority we can not trust:
Timestamps of generated commits
Conflicts with banned devices. If multiple admin devices are present and if Alice can speak with Bob but not Denys and Carla; Carla can speak with Denys; Denys bans Alice, Alice bans Denys, what will be the state when the 4 members will merge the conversations.
A device can be compromised, stolen or its certificate can expire. We should be able to ban a device and avoid that it lies about its expiration or send messages in the past (by changing its certificate or the timestamp of its commit).
Similar systems (with distributed group systems) are not so much, but these are some examples:
Signal, without any central server for group chat (EDIT: they recently change that point), doesn’t give the ability to ban someone from a group.
This voting system needs a human action to ban someone or must be based on the CRLs info from the repository (because we can not trust external CRLs)
Remove a device from a conversation
This is the only part that MUST have a consensus to avoid conversation’s split, like if two members kick each other from the conversation, what will see the third one?
This is needed to detect revoked devices, or simply avoid getting unwanted people present in a public room. The process is pretty similar between a member and a device:
Alice removes Bob
Note: Alice MUST be admins to vote
First, she votes for banning Bob. To do that, she creates the file in /votes/ban/members/uri_bob/uri_alice (members can be replaced by devices for a device, or invited for invites or admins for admins) and commits
Then she checks if the vote is resolved. This means that >50% of the admins agree to ban Bob (if she is alone, it’s sure it’s more than 50%).
If the vote is resolved, files into /votes/ban can be removed, all files for Bob in /members, /admins, /invited, /CRLs, /devices can be removed (or only in /devices if it’s a device that is banned) and Bob’s certificate can be placed into /banned/members/bob_uri.crt (or /banned/devices/uri.crt if a device is banned) and committed to the repo
Then, Alice informs other users (outside Bob)
*Alice (admin) re-adds Bob (banned member)
Fir she votes for unbanning Bob. To do that, she creates the file in /votes/unban/members/uri_bob/uri_alice (members can be replaced by devices for a device, or invited for invites or admins for admins) and commits
Then she checks if the vote is resolved. This means that >50% of the admins agree to ban Bob (if she is alone, it’s sure it’s more than 50%).
If the vote is resolved, files into /votes/unban can be removed, all files for Bob in /members, /admins, /invited, /CRLs, can be re-added (or only in /devices if it’s a device that is unbanned) and committed to the repo
Remove a conversation
Save in convInfos removed=time::now() (like removeContact saves in contacts) that the conversation is removed and sync with other user’s devices
Now, if a new commit is received for this conversation it’s ignored
Now, if Jami startup and the repo is still present, the conversation is not announced to clients
Two cases: a. If no other member in the conversation we can immediately remove the repository b. If still other members, commit that we leave the conversation, and now wait that at least another device sync this message. This avoids the fact that other members will still detect the user as a valid member and still sends new message notifications.
When we are sure that someone is synched, remove erased=time::now() and sync with other user’s devices
All devices owned by the user can now erase the repository and related files
How to specify a mode
Modes can not be changed through time. Or it’s another conversation. So, this data is stored in the initial commit message. The commit message will be the following:
{
"type": "initial",
"mode": 0,
}
For now, “mode” accepts values 0 (ONE_TO_ONE), 1 (ADMIN_INVITES_ONLY), 2 (INVITES_ONLY), 3 (PUBLIC)
Processus for 1:1 swarms
The goal here is to keep the old API (addContact/removeContact, sendTrustRequest/acceptTrustRequest/discardTrustRequest) to generate swarm with a peer and its contact. This still implies some changes that we cannot ignore:
The process is still the same, an account can add a contact via addContact, then send a TrustRequest via the DHT. But two changes are necessary:
The TrustRequest embeds a “conversationId” to inform the peer what conversation to clone when accepting the request
TrustRequest are retried when contact come backs online. It’s not the case today (as we don’t want to generate a new TrustRequest if the peer discard the first). So, if an account receives a trust request, it will be automatically ignored if the request with a related conversation is declined (as convRequests are synched)
Then, when a contact accepts the request, a period of sync is necessary, because the contact now needs to clone the conversation.
removeContact() will remove the contact and related 1:1 conversations (with the same process as “Remove a conversation”). The only note here is that if we ban a contact, we don’t wait for sync, we just remove all related files.
Tricky scenarios
There are some cases where two conversations can be created. This is at least two of those scenarios:
Alice adds Bob
Bob accepts
Alice removes Bob
Alice adds Bob
or
1, Alice adds Bob & Bob adds Alice at the same time, but both are not connected together
In this case, two conversations are generated. We don’t want to remove messages from users or choose one conversation here. So, sometimes two 1:1 swarm between the same members will be shown. It will generate some bugs during the transition time (as we don’t want to break API, the inferred conversation will be one of the two shown conversations, but for now it’s “ok-ish”, will be fixed when clients will fully handle conversationId for all APIs (calls, file transfer, etc)).
Note while syncing
After accepting a conversation’s request, there is a time the daemon needs to retrieve the distant repository. During this time, clients MUST show a syncing view to give informations to the user. Note, while syncing:
ConfigurationManager::getConversations() will return the conversation’s id even while syncing
ConfigurationManager::conversationInfos() will return {{“syncing”: “true”}} if syncing.
ConfigurationManager::getConversationMembers() will return a map of two URIs (the current account and the peer who sent the request)
Conversations requests specification
Conversations requests are represented by a Map<String, String> with the following keys:
id: the conversation id
from: uri of the sender
received: timestamp
title: (optional) name for the conversation
description: (optional)
avatar: (optional)
Conversation’s profile synchronization
To be identifiable, a conversation generally needs some metadata, like a title (eg: Jami), a description (eg: some links, what is the project, etc), and an image (the logo of the project). Those metadata are optional but shared across all members, so need to be synced and incorporated in the requests.
Storage in the repository
The profile of the conversation is stored in a classic vCard file at the root (/profile.vcf
) like:
BEGIN:VCARD
VERSION:2.1
FN:TITLE
DESCRIPTION:DESC
END:VCARD
Synchronization
To update the vCard, a user with enough permissions (by default: =ADMIN) needs to edit /profile.vcf
. and will commit the file with the mimetype application/update-profile
. The new message is sent via the same mechanism and all peers will receive the MessageReceived signal from the daemon. The branch is dropped if the commit contains other files or too big or if done by a non-authorized member (by default: <ADMIN).
Last Displayed
In the synchronized data, each devices sends to other devices the state of the conversations. In this state, the last displayed is sent. However, because each device can have its own state for each conversation, and probably without the same last commit at some point, there is several scenarios to take into account:
5 scenarios are supported:
if the last displayed sent by other devices is the same as the current one, there is nothing to do.
if there is no last displayed for the current device, the remote displayed message is used.
if the remote last displayed is not present in the repo, it means that the commit will be fetched later, so cache the result
if the remote is already fetched, we check that the local last displayed is before in the history to replace it
Finally if a message is announced from the same author, it means that we need to update the last displayed.
Preferences
Every conversation has attached preferences set by the user. Those preferences are synced across user’s devices. This can be the color of the conversation, if the user wants to ignore notifications, file transfer size limit, etc. For now, the recognized keys are:
“color” - the color of the conversation (#RRGGBB format)
“ignoreNotifications” - to ignore notifications for new messages in this conversation
“symbol” - to define a default emoji.
Those preferences are stored in a packet MapStringString, stored in accountDir/conversation_data/conversationId/preferences
and only sent across devices of the same user via SyncMsg.
The API to interact with the preferences are:
// Update preferences
void setConversationPreferences(const std::string& accountId,
const std::string& conversationId,
const std::map<std::string, std::string>& prefs);
// Retrieve preferences
std::map<std::string, std::string> getConversationPreferences(const std::string& accountId,
const std::string& conversationId);
// Emitted when preferences are updated (via setConversationPreferences or by syncing with other devices)
struct ConversationPreferencesUpdated
{
constexpr static const char* name = "ConversationPreferencesUpdated";
using cb_type = void(const std::string& /*accountId*/,
const std::string& /*conversationId*/,
std::map<std::string, std::string> /*preferences*/);
};
Merge conflicts management
Because two admins can change the description at the same time, a merge conflict can occur on profile.vcf
. In this case, the commit with the higher hash (eg ffffff > 000000) will be chosen.
APIs
The user got 2 methods to get and set conversation’s metadatas:
<method name="updateConversationInfos" tp:name-for-bindings="updateConversationInfos">
<tp:added version="10.0.0"/>
<tp:docstring>
Update conversation's infos (supported keys: title, description, avatar)
</tp:docstring>
<arg type="s" name="accountId" direction="in"/>
<arg type="s" name="conversationId" direction="in"/>
<annotation name="org.qtproject.QtDBus.QtTypeName.In2" value="VectorMapStringString"/>
<arg type="a{ss}" name="infos" direction="in"/>
</method>
<method name="conversationInfos" tp:name-for-bindings="conversationInfos">
<tp:added version="10.0.0"/>
<tp:docstring>
Get conversation's infos (mode, title, description, avatar)
</tp:docstring>
<annotation name="org.qtproject.QtDBus.QtTypeName.Out0" value="VectorMapStringString"/>
<arg type="a{ss}" name="infos" direction="out"/>
<arg type="s" name="accountId" direction="in"/>
<arg type="s" name="conversationId" direction="in"/>
</method>
where infos
is a map<str, str>
with the following keys:
mode: READ-ONLY
title
description
avatar
Re-import an account (link/export)
The archive MUST contain conversationId to be able to retrieve conversations on new commits after a re-import (because there is no invite at this point). If a commit comes for a conversation not present there are two possibilities:
The conversationId is there, in this case, the daemon is able to re-clone this conversation
The conversationId is missing, so the daemon asks (via a message
{{"application/invite", conversationId}}
) a new invite that the user needs to (re)accepts
Note, a conversation can only be retrieved if a contact or another device is there, else it will be lost. There is no magic.
Used protocols
Git
Why this choice
Each conversation will be a git repository. This choice is motivated by:
We need to sync and order messages. The Merkle Tree is the perfect structure to do that and can be linearized by merging branches. Moreover, because it’s massively used by Git, it’s easy to sync between devices.
Distributed by nature. Massively used. Lots of backends and pluggable.
Can verify commits via hooks and massively used crypto
Can be stored in a database if necessary
Conflicts are avoided by using commit messages, not files.
What we have to validate
Performance?
git.lock
can be lowHooks in libgit2
Multiple pulls at the same time?
Limits
History can not be deleted. To delete a conversation, the device has to leave the conversation and create another one.
However, non-permanent messages (like messages readable only for some minutes) can be sent via a special message via the DRT (like Typing or Read notifications).
Structure
/
- invited
- admins (public keys)
- members (public keys)
- devices (certificates of authors to verify commits)
- banned
- devices
- invited
- admins
- members
- votes
- ban
- members
- uri
- uriAdmin
- devices
- uri
- uriAdmin
- unban
- members
- uri
- uriAdmin
- CRLs
File transfer
Swarm massively changes file transfer. Now, all the history is syncing, allowing all devices in the conversation to easily retrieve old files. This changes allow us to move from a logic where the sender pushed the file on other devices, via trying to connect to their devices (This was bad because not really resistant to connections changes/failures and needed a manual retry) to a logic where the sender allow other devices to download. Moreover, any device having the file can be the host for other devices, allowing to retrieve files even if the sender is not there.
Protocol
The sender adds a new commit in the conversation with the following format:
value["tid"] = "RANDOMID";
value["displayName"] = "DISPLAYNAME";
value["totalSize"] = "SIZE OF THE FILE";
value["sha3sum"] = "SHA3SUM OF THE FILE";
value["type"] = "application/data-transfer+json";
and creates a link in ${data_path}/conversation_data/${conversation_id}/${file_id}
where file_id=${commitid}_${value["tid"]}.${extension}
Then, the receiver can now download the files by contacting the devices hosting the file by opening a channel with name="data-transfer://" + conversationId + "/" + currentDeviceId() + "/" + fileId
and store the info that the file is waiting in ${data_path}/conversation_data/${conversation_id}/waiting
The device receiving the connection will accepts the channel by verifying if the file can be sent (if sha3sum is correct and if file exists). The receiver will keep the first opened channel, close the others and write into a file (with the same path as the sender: ${data_path}/conversation_data/${conversation_id}/${file_id}
) all incoming data.
When the transfer is finished or the channel closed, the sha3sum is verified to validate that the file is correct (else it’s deleted). If valid, the file will be removed from the waiting.
In case of failure, when a device of the conversation will be back online, we will ask for all waiting files by the same way.
Call in swarm
Idea
A swarm conversation can have multiple rendez-vous. A rendez-vous is defined by the following uri:
“accountUri/deviceId/conversationId/confId” where accountUri/deviceId describes the host.
The host can be determined via two ways:
In the swarm metadatas. Where it’s stored like the title/desc/avatar of the room
Or the initial caller.
When starting a call, the host will add a new commit to the swarm, with the URI to join (accountUri/deviceId/conversationId/confId). This will be valid till the end of the call (announced by a commit with the duration to show)
So every part will receive the infos that a call has started and will be able to join it by calling it.
Attacks?
Avoid git bombs
Notes
The timestamp of a commit can be trusted because it’s editable. Only the user’s timestamp can be trusted.
TLS
Git operations, control messages, files, and other things will use a p2p TLS v1.3 link with only ciphers which guaranty PFS. So each key is renegotiated for each new connexion.
DHT (udp)
Used to send messages for mobiles (to trigger push notifications) and to initiate TCP connexions.
Network activity
Process to invite someone
Alice wants to invite Bob:
Alice adds bob to a conversation
Alice generates an invite: { “application/invite+json” : { “conversationId”: “$id”, “members”: [{…}] }}
Two possibilities for sending the message a. If not connected, via the DHT b. Else, Alice sends on the SIP channel
Two possibilities for Bob a. Receives the invite, a signal is emitted for the client b. Not connected, so will never receive the request cause Alice must not know if Bob just ignored or blocked Alice. The only way is to regenerate a new invite via a new message (cf. next scenario)
Process to send a message to someone
Alice wants to send a message to Bob:
Alice adds a message in the repo, giving an ID
Alice gets a message received (from herself) if successful
Two possibilities, alice and bob are connected, or not. In both case a message is crafted: { “application/im-gitmessage-id” : “{“id”:”$convId”, “commit”:”$commitId”, “deviceId”: “$alice_device_hash”}”}. a. If not connected, via the DHT b. Else, Alice sends on the SIP channel
Four possibilities for Bob: a. Bob is not connected to Alice, so if he trusts Alice, ask for a new connection and go to b. b. If connected, fetch from Alice and announce new messages c. Bob doesn’t know that conversation. Ask through the DHT to get an invite first to be able to accept that conversation ({“application/invite”, conversationId}) d. Bob is disconnected (no network, or just closed). He will not receive the new message but will try to sync when the next connection will occur
Implementation
Supported messages
Initial message
{
"type": "initial",
"mode": 0,
"invited": "uri"
}
Represents the first commit of a repository and contains the mode:
enum class ConversationMode : int { ONE_TO_ONE = 0, ADMIN_INVITES_ONLY, INVITES_ONLY, PUBLIC }
and invited
if mode = 0.
Text message
{
"type": "text/plain",
"body": "content",
"react-to": "id (optional)"
}
Or for an edition:
{
"type": "application/edited-message",
"body": "content",
"edit": "id of the edited commit"
}
Calls
Show the end of a call (duration in milliseconds):
{
"type": "application/call-history+json",
"to": "uri",
"duration": "3000"
}
Or for hosting a call in a group (when it starts)
{
"type": "application/call-history+json",
"uri": "uri of the host",
"device": "device of the host",
"confId": "hosted confId"
}
A second commit with the same JSON + duration
is added at the end of the call when hosted.
Add a file
{
"type": "application/data-transfer+json",
"tid": "unique identifier of the file",
"displayName": "File name",
"totalSize": "3000",
"sha3sum": "a sha3 sum"
}
totalSize
is in bits,
Updating profile
{
"type": "application/update-profile",
}
Member event
{
"type": "member",
"uri": "uri of the member",
"action": "add/join/remove/ban"
}
When a member is invited, join or leave or is kicked from a conversation
Vote event
Generated by administrators to add a vote for kicking or un-kicking someone.
{
"type": "vote",
"uri": "uri of the member",
"action": "ban/unban"
}
!! OLD DRAFT !!
Note: Following notes are not organized yet. Just some line of thoughts.
Crypto improvements.
For a serious group chat feature, we also need serious crypto. With the current design, if a certificate is stolen as the previous DHT values of a conversation, the conversation can be decrypted. Maybe we need to go to something like Double ratchet.
Note: a lib might exist to implement group conversations.
Needs ECC support in OpenDHT
Usage
Add Roles?
There is two major use case for group chats:
Something like a Mattermost in a company, with private channels, and some roles (admin/spectator/bot/etc) or for educations (where only a few are active).
Horizontal conversations like a conversation between friends.
Jami will be for which one?
Implementation idea
A certificate for a group that sign user with a flag for a role. Adding or revoking can also be done.
Join a conversation
Only via a direct invite
Via a link/QR Code/whatever
Via a room name? (a hash on the DHT)
What we need
Confidentiality: members outside of the group chat should not be able to read messages in the group
Forward secrecy: if any key from the group is compromised, previous messages should remain confidential (as much as possible)
Message ordering: There is a need to have messages in the right order
Synchronization: There is also a need to be sure to have all messages at soon as possible.
Persistence: Actually, a message on the DHT lives only 10 minutes. Because it’s the best timing calculated for this kind of DHT. To persist data, the node must re-put the value on the DHT every 10 minutes. Another way to do when the node is offline is to let nodes re-put the data. But, if after 10 minutes, 8 nodes are still here, they will do 64 requests (and it’s exponential). The current way to avoid spamming for that is queried. This will still do 64 requests but limit the max redundancy to 8 nodes.
Other distributed ways
IPFS: Need some investigation
BitMessage: Need some investigation
Maidsafe: Need some investigation
Based on current work we have
Group chat can be based on the same work we already have for multi-devices (but here, with a group certificate). Problems to solve:
History sync. This needs to move the database from the client into the daemon.
If nobody is connected, the synchronization can not be done, and the person will never see the conversation
Another dedicated DHT
Like a DHT with a superuser. (Not convinced)
File transfer
Currently, the file transfer algorithm is based on a TURN connection (See File transfer). In the case of a big group, this will be bad. We first need a p2p implement for the file transfer. Implement the RFC for p2p transfer.
Other problem: currently there is no implementation for TCP support for ICE in PJSIP. This is mandatory for this point (in pjsip or homemade)
Resources
https://eprint.iacr.org/2017/666.pdf
Robust distributed synchronization of networked linear systems with intermittent information (Sean Phillips and Ricardo G.Sanfelice)