Just 12 European researchers honoured in 2022
Outside of the computing sphere, The Association for Computing Machinery (ACM) is little-known in France. Nevertheless, researchers in this field know it has existed since 1947, boasts around 100,000 members in 190 countries and awards the annual Turing Prize, which is considered to be the unofficial Nobel Prize for computer science.
Shadi Ibrahim’s appointment as ‘Distinguished Member’ of the ACM is thus a significant achievement. To obtain this honour, you need at least 15 years’ professional experience and have executed outstanding, high-impact work in computer science on an international level. You must also obtain written support from at least four figures, including two members of the ACM.
‘It is a huge honour to receive this distinction’, Shadi Ibrahim commented. ‘I learnt of it, to my immense joy, through an email I discovered at 11pm one evening. You have to realise that only 67 researchers from around the world became distinguished members in 2022, including just 12 in Europe.’
A thesis in China
Shadi Ibrahim earned the acknowledgement of his peers during his thesis, carried out in Wuhan (China), for his work on the management of large data volumes in the Cloud. At the time, methods were emerging to divide the heaviest files into small pieces distributed to multiple servers, for local calculations on these servers and to avoid fastidious data transfers.
Shadi Ibrahim succeeded in improving these methods (MapReduce, Hadoop), which had already been widely adopted by Cloud industries. He published numerous scientific articles that would inspire other works, some of which received more than a hundred citations. A spectacular start to his career.
Alternative ways to store ever-larger volumes of data
In 2011, with a PhD to his name, the researcher headed to Europe. He joined INRIA Rennes under a post-doc, and was recruited by the centre two years later. On a scientific level, he has remained in the same field, devoting his efforts to making big data management more efficient and reliable, in contexts where data is distributed among multiple servers and remote computers.
A new development began to alter the state of play, adding the management of huge data flows to the processing of fixed-sized data. This included data generated by banking fraud detection systems or those monitoring computer networks for cyber-attacks.
The French National Research Agency (NRA) Kerstream project to enhance the management of data flows
Shadi Ibrahim set about resolving this issue with the NRA KerStream project, which he led from 2017 to 2022. His aim was to overcome the limitations of the first solutions designed to process data flows in the Cloud, such as Spark, Storm or Flink.
In particular, the KerStream team developed new detection and management methods for ‘lagging’ tasks, i.e., those which are executed more slowly than others and thus slow down a whole calculation series.
‘Three problems occur’, the researcher explains ‘Firstly, how do we detect the lagging tasks without making an error? Some tasks take longer than others simply because they are more complex. Secondly, when we find a real lag, which server should we send it to, to execute the task more rapidly? Thirdly, how can we optimise this allocation of tasks to minimise the use of energy?’
Within five years, KerStream produced several innovative methods and prototypes, which they published and placed on open access. For Shadi Ibrahim, this was a remarkable venture. ‘We rose to some great technical challenges and wrote several papers. For the young researchers on the team, this project was a springboard to launch their career. For my part, I enjoyed having so much independence to devote to clearing a field.’
Big Data on the Cloud, a collaboration with OVH
Still in the field of Big Data, in 2021 and 2022, Shadi Ibrahim launched collaborative projects with two Cloud players, OVH, which no longer needs an introduction, and the start-up Hive[1] . The aim was to find solutions to absorb the growth of data volumes without increasing the number of servers ad infinitum, especially given that this data is generally duplicated by way of precaution!
What lead did the researcher explore in association with a PhD student? They decided to target the ‘erasure codes’, a technique which safeguards files in a lesser volume than the original file. For a volume 100 file, the safeguards use a volume 50 for example, which is more economical than simple or multiple duplication (+100% bulk).
‘There is a lot of interest from Cloud operators, but there are still issues to resolve,’ Shadi Ibrahim explains. ‘Encrypting and decoding files uses a lot of calculation capacity. Likewise, recreating a lost file from its safeguard copies involves reading and transferring a great deal of data. We need to take advantage of erasure codes while limiting these drawbacks.’
Big Data and HPC, heading towards a convergence
Never short of ideas, the distinguished new member of the ACM also worked on the upcoming convergence of Big Data applications and high performance computing (HPC). ‘The volumes to be processed are swelling so much that we will soon need supercomputers to deal with them. This raises new issues for storage, access time to this data, or interference between the hundreds of thousands of requests which can be launched at the same time, and so on.’
Shadi Ibrahim is looking for a solution via the programming of intermediate storage layers, or burst buffers, between calculation capacities and data storage. His research could influence the future roll-out of exascale computers, capable of executing a billion billion operations per second, and which have strengthened scientific collaborations between INRIA and two prestigious American laboratories, the Argonne National Laboratory (ANL) and Lawrence Berkeley National Laboratory (LBNL).
[1] not to be confused with the international cybercrime network of the same name.
Background in five dates
-
2011: PhD in computer science from Huazhong University of Science and Technology (China).
-
2013: Recruited as a researcher with the KerData team at the INRIA Rennes Centre.
-
2017: Appointed Director of the NRA Kerstream project for the processing of Big Data flows in the Cloud.
-
2020: Winner of the IEEE TCSC excellence prize (Middle Career Researcher) in evolutionary computation
-
2022: Made a Distinguished Member of the Association for Computing Machinery (ACM).
Find out more
- Hive s'associe avec l'Inria pour lancer une offre de cloud souverain (in French), Le Monde informatique, 11/29/2022.