Karën Fort on ethics in natural language processing

Changed on 17/12/2024

A fundamental part of artificial intelligence, natural language processing tools are the subject of much enthusiasm, but there are also concerns surrounding them. By studying the way in which they reproduce and amplify stereotyped biases, Karën Fort, professor in Computer Science at the University of Lorraine and a member of the project team Semagramme (a joint undertaking with Inria, the University of Lorraine and the CNRS based at the Loria laboratory), is seeking to understand the role played by ethics in the design, development and use of these tools.

Illustration transfert et innovation — © Inria / Création et photo A. Audras

Ever since it was launched, ChatGPT has attracted a great deal of interest, in addition to sparking concerns about the threat it could pose to certain professions. But for Karën Fort, “the real danger of artificial intelligence stems less from its capacities, which have to an extent been exaggerated, than it does from the lack of consideration given to ethics during development and the resulting societal and environmental impact.” It was back in the early 2010s while working on her PhD that Karën Fort first encountered microworking crouwsourcing. The treatment of so-called “clickworkers” prompted her to write a paper on the issue, marking the beginning of her interest in ethics. As the years went by she began to focus more and more on this issue, addressing the question of stereotyped biases in large language models (LLMs), the now well-known systems used in natural language processing.

LLMs – powerful but biased

Developed from deep neural networks trained using large online data corpora - “equivalent to 100 million Around the World in Eighty Days” - LLMs are used in a whole host of natural language processing applications, from machine translation tools to chatbots and sentiment analysis software. But although they demonstrate an impressive capacity for producing text - employing an autoregressive model that involves predicting the next word in a sentence - these systems delete produce a significant amount of bias.

As part of her Master’s in Languages and Computer Science, for which her co-supervisors were Karën Fort and Aurélie Névéol,* Fanny Ducel, who is now a PhD student at the LISN (with the same supervisors) examined the mechanisms of LLMs. She tasked models with producing cover letters for different professional fields. Although her prompts did not contain any indication as to gender, the letters generated were all gendered in relation to the type of advert: female for a qualification in hairdressing, male for a qualification in computer science. Conversely, even with clearly gendered prompts, some tools produced degendered letters in cases where the job in question was not traditionally associated with the gender of the person applying for it. Analysis of the 52,000 letters produced, drawing on statistics on gender distribution for the professions in question, revealed that not only were these tools reproducing existing biases - in this case sexist biases - but they were amplifying them.

A system that perpetuates discrimination

Observed in an experimental framework, this is an intriguing and potentially troubling operating principle. It becomes all the more problematic when the tools applying it are used for everyday applications, whether in the public or private sphere, as was seen with a chatbot introduced by the Austrian Ministry of Labour in early 2024 to provide guidance to jobseekers. The amplified reproduction of implicit biases (e.g. “women aren’t very good with computers”) and representation biases (using male terms for job titles as opposed to gender-neutral terms) resulted in allocation biases whereby people discriminated against on the basis of their ethnic origin, sexual orientation, disability status, physical appearance or socioeconomic circumstances are denied their rights.

“The illusion of the technological fix”

Such issues, which reveal a tendency towards caricature inherent to machine learning, also highlight the responsibility of developers, the majority of whom have the same socio-economic profile. Another symptom of this dominant position is that although LLMs work well for a limited number of languages - primarily English, and to a lesser extent around 50 or so others out of a total of 7,000 - “for languages such as Breton or Sami, a language spoken in Northern Europe (Norway, Sweden and Finland), the results are very poor.” There is also an often overlooked environmental dimension. “These models have a catastrophic impact in terms of their consumption of energy and natural resources, not to mention the amount of space they take up. We will soon have to decide whether to use energy and water for hospitals or for data centres. We have to start addressing this now, as one thing that is being underestimated is the speed of the approaching catastrophe”, warns the researcher.

Once again, the key will be to find the right response, but Karën Fort isn't one for “technosolutionism”. “It’s an illusion to think that technology can solve all of the problems created by technology. Neural networks are so powerful that it is difficult to grasp the ramifications. They’re black boxes: you can tap on them to hear what sound they make, but no one really knows what's going on inside. The flaws with ChatGPT are now increasingly less visible as Open AI have been employing people in Kenya to identify them and clean up our intellectual waste, but they’re still there.”

Developing a code of ethics

But despite making such stark warnings in response to the urgency of the situation, Karën Fort isn't throwing in the towel. “In order to solve the problem, we first need to measure the extent of it and come to terms with it. This isn't something that's been getting much attention, especially not in France.” A paper published in 2021 by four female American researchers on the ethical issues raised by the development of LLMs helped to get the ball rolling, although, as Karën Fort points out, “two of the women who wrote the paper, who were part of Google’s Ethics team, were sacked following its publication.” Since then, the number of researchers taking an interest in this issue has risen slowly but steadily. “There is no shortage of research topics. We must develop models that are more frugal and more environmentally friendly. There are teams working on this, coming up with solutions that need to be tested. Others are exploring methods for debiasing. There is also a need for a more powerful evaluation panel that will address communities that have so far been overlooked. These are research subjects in their own right to which Inria can make a real contribution.”

But Karën Fort places her hope primarily in the younger generations, who are more sensitive to such issues than their elders. “In late 2022 I was co-coordinator of “Think before loading”, a training course in AI ethics centred around creative writing that involved getting PhD students to write dystopian stories based on their research. The attention they’re paying to these issues can bring about change, but that change has to come now.”

* Director of research within the Language Sciences and Technology department at the LISN

A specialist in linguistic resources for NLP

After completing her PhD in 2012 – “Annotated resources, a challenge for content analysis: moving towards a methodology for manual corpus annotation” – Karën Fort worked as an associate professor in Computer Science (specialising in NLP) at the Sorbonne. She is now a professor at the University of Lorraine, head of the Master 2 programme in NLP and local coordinator of the Erasmus Mundus LCT programme (Language & Communication Technologies) for l’Institut des Sciences du Digital (IDMC - the French Institute of Digital Science). She is also responsible for coordinating the French National Research Agency (ANR) project InExtenso (Intrinsic and Extrinsic Evaluation of Bias in Large Language Models).

Having been drawn to ethics at an early stage, Karën Fort was involved in the creation of the Charter for Ethics and Big Data in 2013, followed by the Ethics and NLP blog in 2015. A member of the Sorbonne's research ethics committee from 2019 to 2022 and ethics advisor to the EU project AI-Proficient from 2020 to 2023, since 2021 she has been co-chair of the ethics committee of the Association for Computational Linguistics (ACL), alongside Min Yen Kan and Luciana Benotti.