When history meets statistics
The hope is that, by creating an index of the concepts explored by Aristotle in Politics, one of the most important works on the subject of political philosophy, this will help to place ideas such as democracy and dictatorship in their original context. The only problem is that given the limited number of researchers able to devote their time to such a task, dissecting the text in Ancient Greek in order to arrive at this classification, it would take around a hundred years to complete. Antoine Lejay from the project team Pasta (a joint undertaking involving Inria Nancy - Grand Est, the CNRS and l'Institut Élie Cartan de Lorraine) and his colleagues thought they could take on this challenge, making up for a lack of manpower using artificial intelligence.
It all started in early 2022: Lionel Lenotre, a former PhD student of Antoine Lejay and now a lecturer in Probability and Statistics at Haute-Alsace University (UHA), was working alongside the historian Maria Teresa Schettino, and got back in touch with his former PhD supervisor to ask him a question: could Pasta’s expertise in modelling and statistics be used to identify the underlying logical structures in Aristotle’s work?
An interdisciplinary project up to the challenge of the exploratory action
“This was a real challenge, both intellectually as well as from a technical point of view”, explains Antoine Lejay. “It would involve using natural language processing (NLP) tools, only reinventing them to make it possible to search for logical structures throughout the text.” Maria Teresa Schettino tells us more: “This is a properly interdisciplinary project, and the constant flow of dialogue between researchers in AI and those in human and social sciences is another obstacle we will need to overcome. These are two different languages, which will need to learn how to work together and understand each other. It’s hard to overstate the benefits this could have for scientific methodology, including those we can’t predict.”
A project exploring uncharted territory, with no guarantees in terms of results and risky from both a human and a technical perspective - not exactly ideal for seeking funding. Except from Inria. As part of its objectives and performance contract, the institute made a commitment to promote scientific risk-taking and interdisciplinarity. Key to this are exploratory actions, the aim of which is to support researchers with innovative or breakthrough projects. “We compiled our dossier in March 2022, and by June we had been given funding for a postdoctoral researcher for two years. The process was straightforward and efficient. This initial support enabled us to begin our research, which will make it easier to get funding through other channels: we will have some initial results to show and our position will be easier to establish”, says Antoine Lejay.
The first stage: identifying relationships between words and concepts
The exploratory action, given the name Apollon, was launched, bringing together a dozen or so researchers from the project team Pasta, the CNRS and the Archimède laboratory at Haute-Alsace University. They will soon be joined by experts on Aristotle from the University of Pavia in Italy, as well as Catherine Roth, a lecturer in Information Science and Communication at Haute-Alsace University, whose areas of focus include detecting the implicit in language.
Initial research, which began back in January, focused on identifying semantic relationships between certain words and concepts. Is the word ‘banquet’ linked to the concept of ‘royalty’ for example? They then set about the more complicated task of defining ideas linked directly by verbs such as ‘to be’ or ‘to seem’. “There are a number of obstacles with the verb ‘to be’: it can indicate equivalence, but it can also indicate inclusion, in which case the processing we would apply will be different”, explains Antoine Lejay. “What’s more, the NLP tools for Ancient Greek are not reliable enough, and it can be hard to identify the subject or the complement. To make matters worse, some sentences in Ancient Greek do without verbs. This is when interdisciplinarity becomes important: we can ask linguists and philologists to help us to examine the results produced by our relationship recognition algorithm.”
The second stage: training an algorithm
It is this combination between artificial and human intelligence which the exploratory action Apollon believes will bring them success: once they have completed their initial work on relationships within the next year or so, the aim is to have the algorithm analyse the full text before calling on specialists to revise the initial results. These specialists will annotate the results and give them back to the algorithm in order for it to improve. Eventually, within four or five years’ time, the algorithm will be capable of generating a glossary of the concepts found in Politics on its own. “A sentence which talks about democracy in the singular and one which refers to democracies in the plural will have two different meanings”, explains Antoine Lejay. “If you want detailed, expert analysis, you need to factor in all of these nuances. That’s what’s so difficult, but also so stimulating about this project, and one of the things that makes it risky.”
There is another stumbling block facing the team: the incompatibility of existing IT tools with their needs. There is currently no interface that would give a historian or a linguist clear access to the algorithm’s results, directly outlining relationships established in the text. This is something that Apollon needs to create, as such an interface will be essential if experts are to be able to train the algorithm.
Repercussions for human and digital sciences
“The process of identifying stages, requirements and challenges promotes interdisciplinary discussions between members of the exploratory action”, says Antoine Lejay. As you might expect, the impact of this project will also be interdisciplinary. On the one hand, historians will finally be able to read Politics with minimal interference from bias. “We will no longer have to consider what Aristotle wanted to write, simply commenting on his work; instead, we will know what he actually wrote. This will allow historians to validate hypotheses on his writing, establishing clear boundaries between concepts such as monarchy, democracy, tyranny, and so on, as defined by Aristotle”, explains the researcher.
On the other hand, specialists in digital science will be able to add another string to their bow, deploying their modelling skills in a totally new field, in a richly rewarding context that is totally different from their other research. More than 2,300 years after it was written, Politics continues to drive debate - in areas which Aristotle could never have imagined.