• chevron_right

    How's Movim made? Part I - The Architecture

    Timothée Jaussoin – 6 days ago - 06:28 edit

I have been working on Movim for many years now but I never had really the opportunity to explain how Movim is working. Let's take some time to write that :)

This will be split in several articles. I'll start with the big picture and slowly go more inside the project to explain how things are working on a lower level.


I am using social platforms for a long time and I am still surprised to see that even with the billions of dollars invested by the Internet giants they all still works pretty much the same way architecturally wise.

Even with all the new exciting technologies that we had the past few years things still looks the same to me: you publish content using Ajax calls (or through a Websocket), it is saved in the database and your contacts will ping the server once in a while to see if there is something new to grab.

This is quite convenient to built and if you want to do it yourself most of the frameworks out there are offering all the tools to do that easily. You install something like Laravel, Symphony or Zend (sorry people I'm more a PHP guy), put a SQL database in the back, add a couple of REST endpoints a lovely frontend and boom you have your social platform.

But why having to wait to get the publications, why having to pull things all the time?

For decades now we have had chat technologies that allows us to send, in real time, content across the globe, without any trouble. Why can't we do the same things for social stuff?

And here is the core idea of what Movim is made of.

Let's build a (real) real-time social network.

So, one of the mistakes that I tried to avoid when the project started was to reinvent the wheel. You will see that Movim is mostly made of basic and already proven technologies put together.

To built a real time social network I needed to transfer content instantaneously across a network. The content needs to be transported in connected (statefull) systems. It will then be socket based (as opposed to request based systems).

This pretty much excludes already all the new shiny social technologies standardized by the W3C (bye bye ActivityPub and WebSub). Those are built on HTTP. I'm not blaming those standards, they are perfectly valid and seriously defined solutions but it is not what I needed for Movim.

I needed a protocol that was:

  • Real-time based
  • Standard (means, with RFC and other documents that I can build on)
  • Preferably already battle tested
  • Widely deployed

This brings us to XMPP (I will not explain in details what the protocol is made of exactly, you can checkout the Wikipedia page and the main website to get more info).

The XMPP Logo

Basically XMPP brings several advantages here:

  • It is real-time (yay!), so basically all the communications are actually XML (yup, I already see you coming JSON people, but let's not dive into that kind of argument) packets (called stanzas here) sent trough TCP pipes with some TLS around.
  • XMPP is offering a really basic and generic framework with many (many!) extensions that you can pick to build the solution you want on it. Those are the ones that I took to built Movim for example. And because it is XML based you can pretty much extend what you have by embedding existing namespaces. You take Atom you add Pubsub and boom you have a full real-time publication system for articles and attachments already fully defined and specified.
  • You can do way more than simple "social" stuff with it. So no need to compose with 10 other protocols, XMPP is already offering everything to do chat and chatrooms, video-conferencing, publish-subscribe solutions and many other things. This has the advantage to keep the code quite concise because you are only talking one protocol in the backend.
  • XMPP is federated, using a network a bit similar as the email one. The accounts are created on the servers (so that is why the identifiers are also similar: username@server.tld). Clients are then connecting to those XMPP servers. On top of that you can have several clients connected, at the same time, to your account and they will all be synchronized in real-time :)
  • There is already a big community, with serious servers out there that can handle millions of connections simultaneously without any issues (ejabberd <3).

This is also a big advantage for Movim. I don't have to take care of all the network issues. It is "just" a simple and dumb client that connects to XMPP servers and gets/sends content to them.

If you compare also this solution with other federated network solutions such as Mastodon or Diaspora this is also a big difference; for Movim the accounts are actually sitting on a distinct server. So I don't have the need to create another API to communicate with Movim. Everyone can exchange with Movim by just implementing XMPP (and there is already many libraries and solution in the wild to do so).

Ok, we chose the protocol, now we need to build the backend.

A bit of history

It is interesting to explain a bit the history of the project at this point. Movim was created in 2008 as a little experiment to learn programming and to try to develop at the same time a social platform from scratch. I took PHP as a base to build the backend and it was not intended to be more than a simple website with a few social features built-in.

During my studies and with a lot of self learning I improved progressively the project with the precious help of an another friend that left the project a couple of years later (Etenil, if you read me).

In 2014 I finally decided to move to a full real-time solution. The existing one was still based on some real-time "emulation" on top of HTTP to connect to the XMPP network (using the BOSH extension). This rewrite came as well with a huge refactoring of the internal architecture and a redesign of the user interface. But… I kept PHP because I had already a lot of knowledge in this language and I knew already the limits of developing with it. I also had a large part of the codebase (mostly the whole XMPP part) that could be directly ported to the new architecture with only a few adjustments.

At this time a new project was also emerging in the PHP community, ReactPHP. This framework was specifically designed to handle real-time architectures in PHP.

I decided to give it a try.

Launch a daemon, connect some pipes, et voilà!

ReactPHP is coming with a lot of side projects that allows to create all sorts of real-time architectures.

ReactPHP is handling most of the core features of Movim

At this time in the project I'm using:

After a few months of experiments I came up with an architecture that hasn't really changed since then.

Let's have a look more precisely how this is working.

All the Movim structure is handled by one central daemon (called daemon.php, amaze!). This daemon is handling all the Websockets of the users browsers (mobiles and desktop) and is launching a sub process for each user connected. Then it is only acting as a dumb router that forward messages between the user Websockets and their respective processes.

This architecture is bringing a few advantages:

  • The user sessions are isolated and are not affecting each others performances wise
  • They can more easily be controlled (killing one session will not bring down all the others)
  • The main daemon is minimalistic (basically acting as a dumb router)

And has one disadvantage:

  • The memory consumption is higher. The code is loaded several time between all the sub-processes. So count for ~10 to 20Mb per connected user. This is indeed a problem that was greatly improved the past versions by reducing the memory consumption of Movim itself, cutting out some dependencies and also moving to more recent versions of PHP itself (PHP 7.0 ) but is still a challenge for the upcoming versions. This also bring a scalability issue. The backend can easily handle thousands of connections at a time but you'll run out of memory before reaching that point.

Each of those sub-processes resolves and connects to the user XMPP server and then handles all the communications done with it. They also connect to a common SQL database that acts as a caching layer for each account but also to share data between them (to allow the discovery of public resources for example).

The Movim simplified architecture

Finally those processes also handle all the frontend related things, but this will be detailed more precisely in an upcoming article.

Optimizations

Some optimizations were made then to improve the overall performances of this architecture. Here are the 3 main ones that I think had the biggest impacts on the project.

XML stream parser

An XMPP connection is basically a bidirectional stream of XML. The client is sending XML requests (called stanzas) and is parsing the incoming ones. One interesting thing about the incoming ones is that they are all part of the same XML "document". Initialy Movim was detecting each of those stanzas and parse separately.

The parser was then rewritten to work as a stream (see PHP XML Parser). This allows Movim to already prepare the incoming stanzas and fire the related events as soon as the first XML elements are received inside the socket.

This small change really improved the overall XMPP performances of the project. Especially during the connection phase, Movim can now handle several thousands of stanzas in a couple of seconds.

Hello ZeroMQ!

The communications between the main daemon and the sub-processes (called "linkers" in Movim) were initially done using simply stdin/stout. This brought some buffer and performances issues and I choose to use something designed especially for that: ZeroMQ.

Basically, each time the main daemon is launching a linker for a user it also creates two dedicated IPC streams (one for incoming, one for outgoing messages) and then handle everything that is going through them.

This allowed me to remove some buffers that were used to pass those messages along and boosts the overall performances, especially for the UI part.

ZeroMQ is also really lightweight and is already available and packaged in many GNU/Linux distributions.

From Modl to Eloquent

Movim was relying initially on a specifically designed database library (called Modl). The upcoming version (0.14) will ship with the known Eloquent library (used in the Laravel framework).

This change also allowed me to use some fancy features such as eager loading, lazy loading and do proper migrations in the database to boost some requests.

More info of the dedicated post here.

What did I learn?

Through all those years working on this real-time project I can now make a couple of conclusions regarding the choices and changes that I made:

  • PHP is not a problem most of the time: PHP is fast, really fast. Most of the optimizations that I made were related to the way I was handling the streams and their contents and the requests in the databases. PHP7 and the following versions did improved slightly the performances but it was minor regarding the other changes made in the codebase.
  • Check what is blocking: when you are working in real-time, even if things are handled by promises and other asynchronous systems you will have blocking code. Ensure that this code is not "too slow". In Movim for example, the database requests are still considered as blocking (this is another optimization that can be done…) so if a request is taking 200ms to be triggered, it can push back the execution of some code of 200ms.
  • Do some proper "real conditions" testings. I thought Movim was fast until I saw some users struggling with it (Nik, if you read me). Some of the Movim users had way more chatrooms and subscribed feeds than I expected which created some big slowdown, especially during the connection phase. With some proper (and sometimes really simple) optimizations things were brought back to normal.

And maybe the most important one: Keep It Simple!

I have the feeling that a lot of projects are jumping into the DRY (Don't Repeat Yourself) principle a bit too much. Sometime you don't need to import a full library, just write the function that you need and move forward, try to have the less dependencies possible.

Always check what are the requirements of your project and always question their necessity.

Finally don't be afraid of big refactorings (it took me 50 hours of work to move from Modl to Eloquent) to simplify and cleanup your code base if it's necessary.

So is Movim fast?

Movim is fast. In some cases Movim is even faster than some native XMPP clients such as Pidgin and Gajim. It is also faster then several other chat platforms, because of its backend, but also because of the way the frontent is designed. We will talk about that in an upcoming article.

On my account (400 contacts, 50 chatrooms) it only takes a couple of seconds to authenticate and have a fully ready and reactive UI, knowing that the data is coming from a third party (your XMPP server) and are re-synchronized for some of them when you authenticate.

If you want to look for yourself, you can try it out on our official website ;)

That's all folks!

  • image
  • favorite

    4 Like

    Marzanna , Nicolas Vérité , paulfree14 , schroedinger

  • 3 Comments

  • 6 days ago - 07:50 Marzanna

    Thank you, edhelas!

  • 6 days ago - 19:40 bohwaz

    Great post, thanks!

  • 6 days ago - 22:05 paulfree14

    thanks!
    I've shared it within the fediverse. In case a discusions about it starts, you'll find it here:
    https://todon.nl/@paulfree14/100358235547770474

With the recent announcement concerning the biggest known centralized code forge owner change, we have seen back here and there discussions about the creation of a similar tool, but decentralized.

I've used this occasion to recall the work done to implement tickets and merge requests in Salut à Toi (SàT), work which was relatively unoticed at the time of writing, about 6 months ago.

Now, I would like to bring some details on why building those tools.

First of all, why not the big forge? After all, a good part of current libre software is already using it! Well first it's not libre, and we commited ourself in our social contract to use libre software as much as possible, infrastructure included. Then because it's centralized, and there too our social contract is pretty clear, even if it's not as important for infrastructure as it is for SàT itself. Finally, because we are currently using Mercurial, and the most famous forge is build around Git.
We do not hide the fact that we already ask ourselves wether to use this platform or not in general assemblee (cf. minutes – in French –), we were mainly interested in the great visibility it can offer.

« It's centralized? But "Git" is decentralized! » is a point we are ofter hearing and it's a true fact, Git (and Mercurial, and some others) is decentralized. But a code forge is not the version control system, it's all the tools arount it: hosting, tickets, merge/pull requests, comments, wikis, etc. And those tools are not decentralized at the moment, and even if they are often usable throught a proprietary API, they are still under centralization rules, i.e. rules of the hosting service (and its technical hazards). This also means that if the service doesn't want a project, it can refuse, delete, or block it.

Centralization is also a technical facility to catalog and search project… which are on the service. Any external attempt will then have more difficulties to be visible and to attract contributors/users/help. This is a situation we know very well with Salut à Toi (we are not present on proprietary and centralized "social networks" for the same reasons), and we find it unacceptable. It goes without saying that concentrating projects on a single platform is the best way to contribute and exacerbate this state of affairs.
Please note, however, that we are not judging or attacking people and projects who made different choices. These positions are linked to our political commitment.

Why, then, not using existing Libre projects, already advanced and working, like Gitlab? Well, first because we are working with Mercurial and not Git, and secondly because we would put ourselves here too in a centralized solution. And there is an other point: there are nearly no decentralized forges (Fossil maybe?), and we already have nearly everything we need with SàT and XMPP. And let's add that there is some pleasure to build the tools we are lacking.

SàT is on the way to be a complete ecosystem, offering most, if not all, the tools needed to organise and communicate. But it is also generic and re-usable. That's why the "merge requests" system is not linked to a specific SCM (Git or Mercurial), it can be used with other software, and it is actually not only usable for code development. It's a component which will be used where it is useful.

To conclude this post, I would like to remind that if we want to see a decentralized, ethical and politically commited alternative to build our code, organise ourself, and communicate, we can make this real by cooperating and contributing, being with code, design, translations, documentation, testing, etc.
We got recently some help for packaging on Arch (thanks jnanar and previous contributors), and there are continuous efforts for packaging in Debian (thanks Robotux, Naha, Debacle, and other Debian XMPP packagers), if you can participate, please contact us (see our official website), together we can make the difference.
If you are lacking time, you can support us as well on Liberapay: https://liberapay.com/salut_a_toi. Thanks in advance!

  • favorite

    1 Like

    Timothée Jaussoin

Avec la récente annonce concernant le changement de propriétaire de la plus grosse forge centralisée connue, on a vu resurgir ici et là des questionnements sur la création d'un outil similaire mais décentralisé.

J'ai profité de l'occasion pour rappeler le travail effectué pour implémenter tickets et requêtes de fusion (« merge requests ») dans Salut à Toi (SàT), travail qui était passé relativement inaperçu quand j'ai écrit à ce sujet, il y a 6 mois.

Désormais je souhaite apporter quelques précisions sur le pourquoi de ces outils.

Tout d'abord pourquoi pas la grosse forge ? Après tout une importante partie des logiciels libres actuels l'utilise déjà !
D'une part parce que ce n'est pas libre, et nous nous sommes engagés dans notre contrat social à utiliser tant que possible des logiciels libres, y compris pour l'infrastructure. D'autre part parce que c'est centralisé, et là encore notre contrat social est clair à ce sujet, même si c'est moins essentiel pour l'infrastructure que pour SàT lui-même. Enfin parce que nous utilisons à l'heure actuelle Mercurial, et que la forge la plus connue est construite autour de Git.
Ne cachons pas toutefois que nous nous sommes déjà posés la question notamment en assemblée générale (cf. les comptes rendus), nous étions intéressés en particulier par la visibilité.

« C'est centralisé ? Mais « Git » est décentralisé ! » est une réflexion que l'on entend souvent et elle est vraie, Git (et Mercurial, et d'autres) est décentralisé. Mais une forge n'est pas le gestionnaire de version, c'est tous les outils autour : hébergement, tickets, gestion des modifications (merge/pull requests), commentaires, wikis, etc. Et ces outils là ne sont pas décentralisés à l'heure actuelle, et même s'ils sont souvent accessibles par des API spécifiques aux services, ils restent soumis aux lois de la centralisation, c'est-à-dire du service qui héberge (et des aléas techniques de ce service). Cela veut également dire que si le service ne veut pas d'un projet, il peut le refuser, l'effacer, le bloquer.

La centralisation, c'est aussi la facilité pour cataloguer et rechercher… pour les projets qui sont sur ce service. Rendant de facto toute tentative extérieure moins visible et donc augmentant ses difficultés. C'est une situation que nous connaissons bien avec Salut à Toi (nous sommes également absents des « réseaux sociaux » propriétaires et centralisés pour les mêmes raisons), et que nous jugeons inacceptable. Il va sans dire que se concentrer sur une plateforme ne fait qu'encourager et prolonger cet état de fait. Notons tout de même qu'il n'est pas question ici de dénigrer ceux qui ont fait des choix différents, ces réflexions étant liées à notre implication politique forte et les contraintes changent d'un cas à l'autre.

Pourquoi, alors, ne pas utiliser des projets libres existants, avancés et fonctionnels comme Gitlab ? D'une part parce que nous travaillons avec Mercurial et non Git, et d'autre part parce que nous serions là aussi dans la centralisation. Il y a une autre raison : c'est qu'il n'existe pas ou peu (Fossil peut être ?) de forges décentralisées, et nous avons déjà tout ce qu'il nous faut avec SàT et XMPP. Et puis il y a un certain plaisir à créer les outils qui nous manquent.

SàT se veut un écosystème complet, offrant la majeure partie si ce n'est tous les outils nécessaires pour s'organiser et communiquer. Mais il est aussi générique et réutilisable. C'est pourquoi le système de « merge requests » n'est pas lié à un outil particulier (Git ou Mercurial), il peut être utilisé avec d'autre logiciels, et n'est d'ailleurs par réservé au développement de code. C'est une autre brique qui sera utilisée là où ça sera utile.

Pour conclure, je rappelle que si vous voulez voir une alternative décentralisée, éthique et engagée pour construire nos logiciels, nous organiser et communiquer, on peut la rendre possible en coopérant et contribuant, que ce soit avec du code, de la conception graphique (design), des traductions de la documentation des tests, etc.
Nous avons récemment eu de l'aide pour l'empaquetage sur Arch (merci à jnanar et aux mainteneurs précédents), et il y a des efforts continus pour l'empaquetage sur Debian (merci à Robotux, Naha, Debacle et les autre empaqueteur XMPP sur Debian). Si vous pouvez participer, merci de regarder comment nous contacter sur le site officiel), ensemble on peut faire la différence.
Si vous manquez de temps, vous pouvez aussi nous soutenir sur Liberapay: https://liberapay.com/salut_a_toi. Merci d'avance !

  • favorite

    1 Like

    Timothée Jaussoin

La dernière grosse fonctionnalité avant la préparation de la version alpha, le partage de fichiers, et désormais disponible pour Salut à Toi.

SàT est capable d'envoyer ou recevoir des fichiers depuis des années, soit directement quand 2 personnes sont connectées en même temps, ou via un téléversement sur le serveur (via « HTTP upload »).

Il est maintenant possible de partager une hiérarchie de fichiers, ou en d'autres termes un ou plusieurs répertoires. Il y a 2 cas d'utilisation principaux : avec un composant, ou un autre client.

partager un répertoire avec Cagou

Partager un répertoire avec un client

La première façon d'utiliser le partage de fichiers est directement entre 2 appareils. Ceci peut être utilisé, par exemple, pour partager des photos prises sur votre téléphone avec votre ordinateur de bureau, ou pour rapidement donner accès à des documents de travail à vos collègues.
Pour gérer les permissions, vous n'avez qu'à donner les JIDs (identifiants XMPP) des personnes autorisées (ou à cliquer sur les contacts dans Cagou, l'interface graphique).

Le transfert utilise la technologie Jingle, qui va permettre de choisir la meilleure façon d'envoyer le fichier. Cela signifie que si vous êtes sur le même réseau local que l'autre appareil (ce qui est le cas dans le précédant exemple de partager des photos prises sur le téléphone avec votre ordinateur de bureau, quand vous êtes à la maison), la connexion reste en local, et le serveur ne verra que le « signal », c'est à dire les données nécessaires à l'établissement de la connexion.

Mais si vos appareils ne sont pas sur le même réseau local, la connexion est toujours faisable, et SàT essaiera d'utiliser une connexion directe quand c'est possible.

partage de fichiers avec un client

Ci-dessus vous pouvez voir avec quelle simplicité on peut partager un répertoire avec Cagou, l'interface bureau/Android de Salut à Toi.

Le partage de fichiers avec un composant

SàT peut maintenant gérer des composants (qui sont plus ou moins des greffons génériques pour les serveurs XMPP), et un premier permet à un utilisateur de téléverser, lister, ou retrouver des fichiers.

Ceci est vraiment pratique quand vous voulez garder des fichiers privés pour une utilisation ultérieure (et y accéder depuis n'importe quel appareil), ou pour partager un album photo, par exemple, avec votre famille. Cette fonctionnalité est sur la voie d'un service similaire à ce qu'on appelle de nos jours « stockage sur le cloud », sauf que vous pouvez garder le contrôle sur vos données.

partage de fichiers avec un composant

Comme vous pouvez le voir, c'est très similaire à ce qu'il se passe entre 2 clients.

De plus avec le nouveau système d'invitation de SàT, vous pouvez partager des fichiers même avec des personnes qui n'ont pas de compte.

Quelques notes

Le transfert de fichier n'est pas encore chiffré, mais c'est prévu rapidement avec OX (OpenPGP) ou OMEMO.
La fonctionnalité de base est là et fonctionne, mais il y a encore des améliorations prévues à plus ou moins long terme : des quotas pour les utilisateurs, la synchronisation de fichiers, le chiffrement de bout en bout, ou encore la recherche avancée.

Tester

Vous trouverez sur le wiki les instructions sur comment utiliser cette fonctionnalité (en anglais).

Bien sûr vous aurez besoin de la version de développement pour tester. N'hésitez pas à demander de l'aide sur le salon SàT : sat@chat.jabberfr.org (ou avec un butineur).

Un paquet est désormais disponible pour Cagou sur AUR pour les utilisateurs d'Arch Linux, un grand merci à jnanar pour ceci.

Besoin d'aide !

SàT est un très gros projet, avec de fortes racines éthiques. Il est unique sur plus d'un point, et nécessite beaucoup de travail. Vous pouvez aider à sa réussite soit en nous soutenant sur Liberapay soit en contribuant (jetez un œil au site officiel ou venez sur le salon pour plus d'infos).

Le prochain billet sera à propos de la sortie de la version alpha, restez en ligne ;)

Last big feature before the preparation of alpha release, file sharing is now available for Salut à Toi.

SàT has been able to send or receive files for years, either directly when 2 people are connecting at the same time, or via an HTTP upload on the server. It is now possible to share a file hierarchy, or in other words one or several directories. There are 2 main uses cases: using a component, or a client.

sharing a directory with Cagou

Sharing directory with client

The first way to use file sharing is from device to device. It can be used, for instance, to share pictures taken from your phone with your desktop computer, or to quickly give access to discussion papers to your coworkers. To handle permissions, you just have to give the JIDs (XMPP identifiers) of allowed people.

The transfer is using Jingle technology, which will choose the best way to send the file. That means that if you are on the same local network (e.g. the previous case of sharing your phone picture with desktop computer, when you're at home), the connection will stay local, and the server will only see the signal (the data needed to establish the connection).

But if your devices are not on the same local area network, connection is still doable, and it will try to be direct when possible.

file sharing with a client

Above you can see how easy it is to share a directory with Cagou, the desktop/Android frontend of Salut à Toi.

File sharing component

SàT can now act as a component (which is more or less a generic server plugin), and a first one allows a user to upload, list and retrieve files.

This is really handy when you want to keep some files private for later use (and access it from any device), or to share a photo album, for instance, with your family.

This is on the way to a service similar to "cloud storage", except that you may keep control on your data.

file sharing with a component

As you can see, it's pretty similar to the workflow with client.

With the invitation system now available in SàT, you can even share with people without account.

Some notes

File transfer is currently unencrypted, but encryption is planed soon, either with OX (OpenPGP) or OMEMO.
The base feature is there and working, but some improvements are planed at more or less short term: quotas, files synchronization, e2e encryption, advanced search.

Testing

You'll find instruction on how to use this feature on the wiki.

Of course you'll need to use development version, don't hesitate to ask for help on SàT room : sat@chat.jabberfr.org (or via browser).

A package is now available for Cagou on AUR for Arch Linux, thanks to jnanar.

Help needed!

SàT is a huge project, with a strong ethical root. It's unique in many ways, and needs a lot of work. You may help its success either by supporting us on Liberapay or by contributing (check official website or join our room for details).

Next post will be about alpha release, stay connected ;)

Так ли плох #XMPP, как его малюют?
Интересное мнение, но тут вопрос, настолько XMPP переживёт тяжёлые времена распространения огороженных мессенджеров. Жаль, что Matrix и его клиенты всё ещё далёки от идеала.