Designed by the French National Institute for Agricultural Research (INRA), the BVDV software simulates how bovine viral diarrhoea can contaminate herds. It can therefore help identify the best ways to fight an epidemic. Its model is based on data collected for nearly 10 years in 12,750 Breton farms, i.e. 2.7 million dairy cows. This is a fairly representative example of many scientific simulation tools around the world: a useful legacy application but one whose code has never been intended to exploit a highly elastic distributed architecture like the cloud. It is not surprising then, that it takes about two days to perform a regular simulation on a home server with 13 cores and 100 GB of memory.
This software has recently undergone a transformation so that it can now work in the cloud. This conversion was made possible thanks to DiFFuSE , a new framework developed at Inria by the Myriads team. “We developed this tool as part of MIHMES, a five-year collaborative project aimed at creating new methods for managing infectious animal diseases, ” said scientist Nikos Parlavantzas . “The framework can be used for two things. Either building cloud-based epidemic simulation applications or converting old monolithic applications so that they too can use this distributed architecture. That’s what we did with BVDV. ”
The elasticity offered by the cloud makes it possible to allocate more resources as soon as the computing power runs out of steam. “The application adds virtual machines dynamically and, as soon as it no longer needs them, it releases the superfluous resource on the fly. This is a crucial point, because commercial clouds charge according to the resource they actually used. As for academic clouds, they can immediately reassign the released resource to other researchers. ”
Fault tolerance
One of the notable features of DiFFuSE is the way in which is allows simulation software to tolerate faults. “A cloud application uses a lot of machines simultaneously. One of them will eventually break down at one time or another. In these cases, in general, the entire application stops. Users must pause the simulation and start all over again. These stops cause major time losses. Our framework has a mechanism that can detect and react to these faults. ” So, when one virtual machine fails, the load can be reassigned to others.
Other features include: modularity. “This framework makes it possible to develop an application composed of distinct parts, different services that can be managed independently. This has several advantages. For example, if you find a bottleneck on a service, you can automatically assign additional resources to that service and not to the entire application. In addition, these services can be reusable by other applications.” Fault tolerance is also reinforced. “If a service goes down, you no longer have to stop the entire application. It becomes possible to replace the defective service with another one, created on the fly. ”
It should also be noted that the applications generated by DiFFuSE can also be used simultaneously on several clouds. “You can deploy one part on Amazon EC2, for example and another with one of its competitors. ” By controlling both the number and type of cloud resources used, “the software allows users to find the best compromise between cost and performance. ” Incidentally, fault tolerance improves a notch further too. “If one cloud becomes unavailable for a moment, the computing power is automatically sent to another. ”
In the wake of the MIHMES project, Inra and Inria decided to undertake the industrialisation of a series of decision-making support tools in the field of animal health. This collaboration will take the form of a consortium called STEMAH, of which DiFFuSE will be one of the key elements.