Installed on our mobile phones or sitting in our living rooms in docks, voice assistants now perform multiple tasks, including making calls and searching for things online. In the future, voice command will play a much larger role in both our homes lives and our professional lives. Examples of this include interactive note-taking, which will be useful for doctors, lawyers, students and journalists, and the ability to remotely control connected objects, whether in home automation or in IT.
Voice recognition: highly sensitive data
These tools are able to understand spoken language because they are programmed using so-called machine learning algorithms, which draw on vast quantities of data: hundreds of hours of speech are used for each language! For the vast majority of current applications, developers have chosen to store and centralise voice data on private servers.
The issue here is that this is highly sensitive data, containing potentially confidential information on our purchasing habits, our social networks, our health, etc., which could be used for commercial profiling purposes. What’s more, our voices are also easily identifiable, and could be stolen in the event of a security breach.
With privacy an essential concern when it comes to user trust, the Magnet team at the Inria Lille-Nord Europe research centre has been working on artificial intelligence algorithms designed with this constraint of data protection in mind. Marc Tommasi, professor of IT at the University of Lille and head of Magnet, has been contributing to the research carried out by the EU project COMPRISE, founded and coordinated by Emmanuel Vincent, director of research within the Multispeech team at the Inria Nancy-Grand Est research centre.
The aim of the project is to ensure the privacy of the users of future applications, in addition to reducing development costs. Allocated a budget of 3 million Euros over three years (2018-2021), it has brought together around thirty or so researchers and engineers from various different Inria teams, Saarland University and four European manufacturers specialising in software development and legal expertise in relation to data processing. The applications being developed are of particular interest to online retail and the medical sector.
Synergy between different Inria teams
On the Inria side of things, COMPRISE was built based on the complementarity of two teams. “The research carried out by Multispeech relates to different aspects of speech processing, with applications in voice recognition, the learning of foreign languages and audiovisual synthesis”, explains Emmanuel Vincent.
“Within Magnet, we design algorithms for gathering and analysing data which respect the privacy of internet users, by limiting their spread or by rendering them anonymous, for example”, summarises Marc Tommasi.
In the context of the technology targeted by the project, how well a spoken text is automatically understood will depend on the variety of data used by the algorithm during the learning process. “This relates to the language field used (vocabulary and syntax specific to speech, in each individual language), but also to the acoustic characteristics of speech, such as intonation, accentuation, register, tone, etc. It is these same acoustic criteria which mark an individual's vocal identity”, explains Marc Tommasi.
Algorithms designed for data protection
The aim of the work carried out within Magnet for COMPRISE is to develop AI algorithms capable of transcribing the contents of an audio message in such a way that both the anonymity of the speaker and the diversity of the sound form are preserved. Among the different possible options, the avenue felt to be most promising by the researchers in Lille involves designing a voice conversion algorithm.
The researchers then developed criteria for assessing the performance of their solution. Marco Tommasi gave us his verdict: “Although we had no theoretical results that would have enabled us to formally establish how robust our algorithm was, our analysis showed that the method in question was capable of withstanding attacks employing the most advanced technology and which aim to discover the real identity behind a converted voice”.
The voice conversion algorithm is also linked to a text conversion algorithm, another innovation developed in the context of the project, which can be used to mask information contained in messages that might jeopardise privacy. This can now be transferred to project partners looking to develop new services, including for online retail, home assistance and voice command.
When it comes to AI that respects privacy, the results obtained so far by the Inria researchers working on the COMPRISE project show that they are as good as their word!
Designed with privacy constraints in mind
The concept of designing algorithms with privacy constraints in mind is known as Privacy by design. For the COMPRISE project, the Inria researchers explored different possible avenues for achieving this aim, focusing on the research carried out by Brij Mohan Lal Srivastav (a PhD student within the Magnet team). “We combined two learning programmes (so-called antagonistic neural networks), focusing on two objectives: the first was designed to successfully transcribe messages, while the second was designed to fail when identifying the speaker”, explains Marc Tommasi.
Although attractive, this solution did not deliver the expected guarantees, with the algorithm unable, in some cases, to protect the identity of a message’s author. The researchers then turned their attention to another solution: a voice conversion algorithm capable of separating the contents of a spoken message from the identity of the speaker. It will then be possible to use this program in order to build an anonymous database which preserves all of the diversity needed in order for machine learning tools to be effective.