Making sense of social media through a combination of mathematics and machine learning

Exploring social media through algorithms

You may have come across people on social media who you don’t know at all, but who share a significant number of contacts with you. “This means you're part of the same community, without knowing it”, explains Marc Lelarge, a researcher with the new joint project team ARGO (Apprentissage, graphes et optimisation distribuée - Learning, Graphs, and Distributed Optimisation). “To detect these communities we create algorithms using just one piece of information - who is connected with whom - without knowing anything about individuals’ names or how they interact with each other. We seek to identify the limits of these algorithms in order to make them as efficient as possible.”

Représentation, sous la forme de graphe, d'une détection de communautés et mettant en lumière les groupes formés par des individus qui interagissent. — The purpose of community detection is to pick out groups comprising individuals who interact more often with each other. You can represent a social media platform using a graph on which the peaks (nodes) represent individuals, while the edges represent their interactions. There are many advantages to community detection: the ability to identify typical profiles, targeted actions, improved recommendations, restructuring and the ability to identify key or influential individuals. (Image : Creative Commons Attribution 4.0 International).

Another area of focus for ARGO is federated learning, a mathematical method that can be used to train models while retaining control over data, without the need for the processing and storage capacities of GAFAM (Google, Apple, Facebook, Amazon and Microsoft).

Combining graph theory with machine learning

ARGO began life as a spin-off from another team, DYOGENE (Dynamics of Geometric Networks), which was jointly set up with the Computer Science department at the École Normale Supérieure de Paris and is based at the Inria Paris centre. “DYOGENE focuses on the mathematics of geometric networks in communication networks, using probabilistic modelling techniques such as point processes”, explains Marc Lelarge.

DYOGENE’s little sister, ARGO, which is led by Ana Bušić, was born out of a desire to take things further, as Marc Lelarge explains: “We came up with the idea of expanding into machine learning and creating a team specifically for this. We’ve been working together for years, meaning it was easy to come together around a shared interest in machine learning, with a sizeable component relating to graph theory which we had worked on previously.”

Identifying communities and controlling data

ARGO employs the use of “spectral algorithms” to carry out an in-depth study of social media. “Essentially, we employ a type of mathematical analysis known as "Fourier analysis" in order to extract information from irregularly structured data, such as data from social media, for example”, explains Marc Lelarge. “This type of analysis is highly effective and can be interpreted in graph form, making it easy to detect communities.” To improve detection even further, the researchers employ unsupervised machine learning methods whereby algorithms learn how to find unlabelled data and identify relationships between variables.

This video shows two communities being detected: to begin with the individuals are in random order with no apparent connecting structure in the interaction matrix. Then, when the individuals are sorted by community, the strength of interactions within each community is shown by the darker areas.

Another problem ARGO will be seeking to tackle is data control. “Instead of centralising learning on one server, and risking losing control over data, learning is distributed among a wide range of users with more reduced processing capacities”, explains Kevin Scaman, a researcher with the team. “We create algorithms to structure and synchronise this shared, decentralised learning. This process is known as decentralised optimisation.”

Schéma explicatif de l'apprentissage décentralisé. — ARGO’s researchers study decentralised optimisation, which helps to retain control over data, without having to rely on the processing and storage capacities of GAFAM. Decentralised machine learning involves training machine learning models using local data, without the need for these to be shared centrally.

The approach involves taking gradients of a function corresponding to the performance of a machine learning model that you want to improve and sharing these with neighbour users or users who are permanently connected to each other. For this the researchers employ Gossip communication algorithms to teach models simply by calculating local averages between users, without any need for a central server to synchronise the transfer of information. This solution improves the performance of machine learning models while protecting the data used in the learning process.

Exciting possibilities

ARGO may only recently have been formed, but they’ve certainly been busy. Its researchers have published papers at major conferences in the field of machine learning, including NeurIPS (Neural Information Processing Systems) and ICML (International Conference on Machine Learning). “One of our PhD students even won a Best Paper Award at NeurIPS”, recalls Marc Lelarge fondly. “Some of them have been recruited by Inria, like Hadrien Hendrikx in Grenoble, while others have launched innovative start-ups, like Éric Daoud-Attoyan, who founded Guided Energy. We’re proud of the various young researchers we’ve taken on and the high quality work they’ve produced.”

This highly dynamic team has also contributed to a number of other exciting projects, including the Inria Challenge FedMalin, which deals with distributed machine learning. “We recently joined a PEPR* on the REDEEM project”, explains Kevin Scaman. “This four-year project brings together four teams (DRIM from the LIRIS, LIST from the CEA, SYMPAS from the École Polytechnique and MAGNET from the University of Lille Inria Centre) plus ARGO, who will work together on issues relating to data sovereignty using decentralised machine learning methods. This is massive for our team as it will enable us to recruit three new PhD students.” To be continued…

*PEPR: Programme et équipements prioritaires de recherche - a French government scheme aimed at strengthening France’s position in the world of science and technology.

Optimising renewable energy networks

ARGO is also active on various other fronts aside from social media. Headed up by Ana Bušić, its research also concerns energy networks, the aim being to improve the exploitation of renewable energy sources such as solar and wind.

Synergy with other Inria teams

Beyond its own research, ARGO works closely with other Inria project teams, including WILLOW, a team whose research focuses on issues relating to representation in the field of visual recognition and robotics.

Photo du robot bipède Upkie. — Upkie est un robot bipède, entièrement open source, doté de roues pour l'équilibre et de jambes pour s’adapter à différents terrains. Upkie est conçu pour être construit à la maison avec des outils et des composants commandés en ligne.

Beyond its own research, ARGO works closely with other Inria project teams, including WILLOW, a team whose research focuses on issues relating to representation in the field of visual recognition and robotics.

“We worked on Upkie, a fully open source bipedal robot developed by Stéphane Caron (from WILLOW), aiming to make it possible for everyone and anyone to make one”, explains Marc Lelarge. “Upkie moves around on two wheels, and has an active control mechanism to keep it stable. Our aim was to design algorithms that would keep it upright no matter the situation, even when carrying a heavy load.”

ARGO is also involved in a collaboration with SIERRA (Inria, ENS-PSL, CNRS), a team specialising in machine learning, on the topic of distributed optimisation.

Find out more

General public:

“I optimise therefore I learn” (video), talk by Marc Lelarge at the École Nationale Supérieure d'Électrotechnique, d'Électronique, d'Informatique, d'Hydraulique et des Télécommunications (ENSEEIHT) (via Les Maths à Toulouse), 10/11/2022.
“Another form of artificial intelligence with graph learning”, Inria, 11/05/2023.

For experts:

Francis Bach (Inria): “Distributed Machine Learning over Networks” (video in English), LIDS (MIT), FODSI, September 2020.
“Kullback–Leibler-Quadratic Optimal Control” (scientific paper in English), by Neil Cammardella, Ana Bušić, Sean P. Meyn, SIAM Journal on Control and Optimization 61 (5), 3234-3258. 2023.
“Using loads with discrete finite states of power to provide ancillary services for a power grid” (patent), Sean P. Meyn, A. Busic. US Patent 10,692,158. 2020.
“Actor Critic Agents for Wind Farm Control” (scientific paper in English), by Claire Bizon Monroc, Ana Bušić, Donatien Dubuc, Jiamin Zhu, ACC, 2023.