It has been several years since GAFAM1 moved into the voice control sector. In this area, the Google Home and Amazon Echo connected speaker products are gaining a large share of a growing market. Whether to make a phone call, brew coffee or shop online, these digital assistants allow users to perform a variety of daily tasks through simple voice interaction. At present, however, this technology poses risks to users’ privacy. “These systems work using machine learning and therefore need to capture users’ speech in order to constantly improve their performance. This data is then stored in the cloud and is often exploited for commercial profiling purposes. In the event of a security breach, this data could even be used by hackers to impersonate users,”explains Emmanuel Vincent, Inria Director of Research within the Multispeech2 team and coordinator of the COMPRISE project. This consortium, which brings together research teams and industrialists from four European countries3, was created at the end of 2018 to develop a multilingual voice interaction system that is both more secure and simpler to use.Based on the concept of deep learning, their approach is to design a human-machine dialogue system using a more elaborate natural language.
Facilitating the use of voice interaction for developers
With a budget of €3.2 million over three years, this European joint initiative has set itself the objective of designing and assembling the various automatic voice processing features. “The aim is to develop a software suite that brings together a set of features ranging from transforming voice into text to managing dialogue, and including understanding and generating natural language,”explains Emmanuel Vincent. Thanks to the expertise of the German partner Ascora, third-party software vendors will be able to easily integrate this software suite into a wide range of voice-controlled applications. These can then be downloaded by users to their smartphones, tablets and connected speakers.Another limitation of current voice assistance systems is their poor performance or even unavailability in languages with a smaller number of speakers. By integrating machine translation into its software, COMPRISE will also provide users with the ability to interact with an application available in a language other than their own.
Strengthening cybersecurity
An important part of the project concerns safeguarding the private information of both users and companies. In this area, COMPRISE aims to develop a prototype secure platform for collecting and managing voice data. Designed with the help of the Latvian company Tilde, also a member of the consortium, this digital platform will only collect data of a generic nature to improve the functionality of the software suite. Voice data of a private or personal nature will not, therefore, leave the user’s device. Before sending information to the digital platform, the user’s voice will also be modified to prevent identification.The same applies to the risk of industrial espionage, as the project coordinator explains: “For example, for a large retailer who wants to add a voice control system to the Drive service on its website, there is now a risk of industrial espionage at the hands of the company providing this service.” To limit this risk as much as possible, COMPRISE plans to develop a Drive-type demonstrator (see inset) with the help of the French start-up Netfective Technology, which is also a partner in the project.
When e-commerce starts talking
Based on the results of the COMPRISE project, Netfective Technology plans to develop a demonstration platform for a Drive-type service available in at least two European languages such as French and Portuguese. The objective of the initiative is to provide new voice features to customers while guaranteeing a high level of confidentiality and personal data protection. Although it is currently impossible to create online services translated into all European languages, this demonstrator should nevertheless provide users with the ability to “talk” to a foreign drive service in their own language and receive a quick response to their requests. Throughout the platform’s test phase, new voice-based features will be introduced to a panel of beta test customers with various profiles (ages, purchasing habits, languages, etc.). Their feedback will be collected at regular intervals by Netfective Technology to improve the service over time.
An acronym referring to the five United States digital giants: Google, Apple, Facebook, Amazon and Microsoft.
Joint project team from the University of Lorraine, Inria and CNRS.
This project involves both the Multispeech and Magnet teams at Inria, researchers from Saarland University in Germany, as well as four industrial partners from Germany, France, Latvia and Spain.