Theory of games

Artificial intelligence, an unrivalled poker player

Date:
Changed on 19/12/2024
Will artificial intelligence ever be able to match humans in terms of reasoning and complex task performance? The FAIRPLAY team at the Inria Saclay centre has taken a major step in this direction. How has it succeeded? By optimising algorithms capable of developing strategies by... gaming. A major innovation at the convergence of machine learning, optimisation and game theory, which has already won a prestigious award.
© Michal Parzuchowski / Unsplash

An IA capable of gaming to draw closer to humans?

How could algorithms perform complex tasks as well as humans? To answer this question, scientists first studied image recognition in the late 1950s. “But that was just the beginning,” stresses Vianney Perchet, co-leader of the FAIRPLAY team and professor at the Centre de recherche en économie et statistique (CREST – Centre for Research in Economics and Statistics) at the École nationale de la statistique et de l'administration économique in Paris (ENSAE Paris – National School of Statistics and  Economic Administration). “To go further, the question needed to be rephrased: what human action, with an impact on the environment and the future, could be automated and carried out by artificial intelligence?  Researchers then turned their attention to long-term decision-making problems.”  

In this context, gaming was chosen, not for its entertainment value, but to understand the consequences of the decisions made by two interacting “agents” – the players. Draughts was studied in the 1970s, followed by chess.

The FAIRPLAY team, between economics and game theory 

Game theory is precisely the specialism of FAIRPLAY, a young team established in 2022 by Criteo, ENSAE and Inria. Bringing together two researchers from Inria, five from ENSAE and five from Criteo, it has set itself the task of studying the interactions between this theory, machine learning and economics. “Bringing our three entities together as part of a common team has created a win-win situation,” states Vianney Perchet. “In particular, our cooperation with a private company enables us to tackle real-life issues. Rather than choosing a subject just because we like it, I think it's more relevant to start with applications and then mathematically abstract them to solve a problem.” Research on machine learning generally focuses on cases with a single agent (an algorithm). The FAIRPLAY team adopts a different approach by focusing on economic systems in which different agents, such as several companies, interact. Within the framework of these multi-agent models, the researchers pay particularly attention to privacy, ethics and fairness. 

About Criteo

Criteo is a global technology company that provides the world's leading Commerce Media Platform. Criteo's 2,800 team members partner with over 22,000 marketers and thousands of media owners worldwide to activate the world's largest set of commerce data to drive better business outcomes. By delivering reliable and relevant advertising, Criteo delivers richer experiences to every consumer while supporting a fair and open Internet that enables discovery, innovation and choice. For more information, visit www.criteo.com

Poker-playing algorithms

Back to gaming. Have you heard of Deep Blue, the computer that beat world chess champion Garry Kasparov in 1996? Although, strictly speaking, this did not yet quite involve AI, it marked the moment at which the scientific community started looking for even more complex challenges to improve algorithms. In the 2010s, it focused on the game of go, which surpasses chess in terms of its rich combinatorial opportunities and strategic depth. 

However, whether the game is draughts, chess or go, each player sees the board and therefore has the same data. “But in real life, not all humans have the same information on which to base their decisions," notes Côme Fiegel, a PhD researcher on the FAIRPLAY team. But there is one game that comes close to this reality: poker. “In poker, each player possesses secret and asymmetric information,” adds the young researcher. In 2021, scientists currently working in partnership with the FAIRPLAY team therefore created poker-playing algorithms. They then sought to make sure that they could find the best solution in a finite amount of time. This is the case. Their algorithms are capable of learning near-optimal strategies relatively quickly, which Côme Fiegel perfected as part of his thesis, with support from the FAIRPLAY team.    

Verbatim

The key message of [...] our team is to pay the utmost attention to concealed strategies, which have never been addressed in previous research. With this result, we have taken research on increasingly complicated games to a new level.

Auteur

Vianney Perchet

Poste

Co-leader of the FAIRPLAY team and Professor at CREST, ENSAE Paris

One-armed bandits to optimise players' choices

The decision tree allows us to explore the predictions of the optimal solution,” explains the PhD researcher. This one wasn't perfect and to improve it, I applied the  ”multi-armed bandit” idea. Imagine walking into a casino full of slot machines, formerly known as “one-armed bandits”. You have to choose which machine to play on. Each machine offers a reward and the aim is to collect the most winnings. 

To find the best machine, you can try out each machine one after the other. But that will cost you a great deal of money. A more subtle strategy could fluctuate between exploitation, which consists in using the machine that provides substantial rewards, and exploration, which consists in testing another machine in the hope of increasing your winnings.” To resolve this dilemma between exploitation and exploration, the FAIRPLAY team used the FTRL (“Follow The Regularized Leader”) method, which uses an optimisation algorithm. How does it work? A machine is chosen at random, while favouring the most successful actions so far. In the long run, this gives a very high probability of choosing the best strategy. 

Beware of opponents who keep their cards close to their chest

In gaming, the best algorithm is the one that will be optimal in the worst possible situation. For example, one of the players may conceal part of their game to fool the algorithm. “So you need to consider this option when exploring the decision tree,” explains Côme Fiegel. “Otherwise, the playing strategy will not be optimal and the algorithm will be slower.” This reality, which had not previously been taken into account in poker-playing algorithms, served as the basis for his thesis. 

Taking all the parameters of the problem into account, the PhD researcher designed the algorithm to be as fast as possible in the worst-case scenario. This groundbreaking research project won a Best Paper Award at the ICML machine learning conference in 2023.The key message of Côme's paper and of our team is to pay the utmost attention to concealed strategies, which have never been addressed in previous research,” sums up Vianney Perchet. “With this result, we have taken research on increasingly complicated games to a new level.” What is the next step? Playingpoker with more than two players, to reflect the reality of society even more closely," continues the joint head of the FAIRPLAY team. “A much more complex mission, and one that's far from accomplished, because in real life there are far more than just two players!”

Discover the ICML award-winning paper

ICML (International Conference on Machine Learning) is the premier gathering of professionals dedicated to advancing the branch of artificial intelligence known as machine learning. ICML is world-renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related fields such as artificial intelligence, statistics and data science, as well as in important application areas such as computer vision, computational biology, speech recognition and robotics.

Co-authors Côme Fiegel, Pierre Ménard, Tadashi Kozumo, Rémi Munos, Vianney Perchet and Michal Valko received awards for their paper “Local and adaptive mirror descents in extensive-form games” at ICML, in July in Hawaii, United States.

Find out more