An industrial-scale compiler
In the wake of the proof assistant Coq, in 2013, the software program CompCert has just become the second software program with its roots in theoretical research at Inria to win an ACM Software Award, at a ceremony in the USA. Seven of the researchers who contributed most to its development, alongside Inria, were recognised by the ACM (Association for Computing Machinery): Xavier Leroy, a professor at the Collège de France since 2018 and former Inria director of research, Sandrine Blazy (Université de Rennes 1), Zaynah Dargaye (Nomadic Labs), Jacques-Henri Jourdan (CNRS), Jean-Baptiste Tristan (Boston College) and the engineers Michael Schmidt and Bernhard Schommer (from the German company AbsInt).
CompCert, the first version of which was created in 2005, is an industrial-scale compiler for the C programming language. But what is a compiler? “It’s a bit like a translator”, says Xavier Leroy, a researcher from the project team Cambium, a joint undertaking involving the Inria Paris centre and the Collège de France. “As developers, we write code in different so-called ‘high-level’ programming languages (C, C++, Java...), which humans are able to write and understand”, explains the specialist, who is also recognised as being the main designer of the programming language OCaml. “But in order for the code to run on a machine, it has to be translated into a machine language (coded instructions, through the use of numbers, etc.). That’s where compilers come in, translating language which humans are able to understand into language which a machine can run.”
A mechanically-verified proof system
The reason they chose to focus on C was because this language, or its variants (such as Caml or OCaml) has historically been used for a wide range of critical systems. “Traditionally, all low-level codes, close to machines, have been written in C – although interesting alternatives, such as RUST, are beginning to emerge”, explains Xavier Leroy. “You might think of operating systems, for instance, such as Linux, Windows or Mac OS, but what is of particular interest to us at CompCert is all of those critical programs used on aircraft, in cars, at nuclear power plants, on driverless trains and in medical equipment, to name but a few.” These are fields in which any bugs would be catastrophic, and where errors in compiled code are not permissible.
The way to tell if the translation (the compiled code) is of good quality is to test, test and test again. But testing can be difficult or even impossible to carry out exhaustively on programs written in tens of millions of lines of code and in multiple different configurations. This is why the developers of CompCert chose to combine their program with a one-of-a-kind mechanically verified proof assistant. With the CompCert compiler, code is formally verified using computer-assisted mathematical proofs, preventing miscompilations from occurring: this makes it possible to prove that the compiled code is behaving on the machine exactly as specified in the original C program.
Using formal methods
In order to move away from standard testing, the researchers opted for other verification methods from the world of research: formal methods. “This has long been a dream in computing”, explains Xavier Leroy. “It involves using mathematical logic to reason in terms of how a program works, in order to prove its validity with regard to individual specifications. This was first expressed by Alan Turing in the 1940s and ‘50s. It returned to prominence in the late 1960s, but remained largely theoretical up until the 2000s.”
What has changed today? “We successfully created static verification tools (known as static analysers), which can be used to automate the deduction verification of mathematical proofs in software source code”, adds the researcher. “The problem also lay in finding a way of using these same methods to verify the reliability of the compiler mathematically.”
In order to achieve this, the researchers from Cambium used as a basis Coq, opensource software developed by Inria. “This solution is a formal proof assistant and enables us to draw on the power of the machine to verify proofs on the compiler itself”, stresses Xavier Leroy. All this while meeting the expectations of a range of industrial partners, such as Airbus, who “have expressed an interest in the compiler’s verification. After ridding a program of bugs, the challenge for them is to get rid of the bugs from the compiler that will produce the executable code.”
Potential new variations
The discoveries made by Cambium through CompCert have continued within Inria, including for verifying complexity in terms of programs’ memory consumption or execution time. Other fields have yet to be explored, such as smart contracts (contracts employing the use of blockchain technology.) Longer term, another possibility could be to make it compatible with artificial intelligence programs developed through the use of machine learning, which differ in that they are not coded conventionally and are extremely large.
For users, a free version of CompCert is now available to download for evaluation, teaching, research or any other type of non-commercial use. There is also a commercial version of the software available, distributed by AbsInt, a specialist in formal verification tools that has established a reputation in the automobile, aeronautics and nuclear sectors at a European level.