A library for creating synthetic knowledge graphs
Date:
Changed on 13/06/2024
More and more knowledge graphs are being used by experts in machine learning, artificial intelligence, the semantic web and even ontologies (the modeling of vocabulary and knowledge on a given subject) to model, visualize and analyze the links that unite the elements of a domain and their descriptions within an information system.
“But specialists don't always have the data they need to work on methods for processing these knowledge graphs, based on features they have already calculated or would like to use, for example because the data is private or doesn't exist," explains Pierre Monnin, a researcher in artificial intelligence with the Wimmics project team at the Inria Centre at Université Côte d’Azur, a joint project between Inria and the I3S laboratory (CNRS, UniCA).
"Our idea with the PyGraft open-source library is therefore to provide them with a means of creating abstract and synthetic datasets that correspond perfectly to the expected characteristics. For example, by helping them create public datasets that look exactly like private data".
Why is this important? “Using PyGraft, whose first version of which was developed by Nicolas Hubert, a doctoral student at the Université de Lorraine, it is possible to carry out new studies, for example in neuro-symbolic AI," explains the researcher, who obtained his thesis at Loria laboratory in Nancy (CNRS, Inria, Université de Lorraine). Neuro-symbolic AI, sometimes presented as the third wave of AI, combines learning (e.g. via neural networks) and symbolic methods (e.g. a reproduction of human reasoning, carried out using symbols and deductive rules such as "My fridge is empty + I'm hungry = I have to go shopping"). With PyGraft, even if you don't have a dataset at your disposal, you have a synthetic, customizable data generator to help you experiment with logical constructs of this kind."
The library has been available for free download on the GitHub platform, since September 2023. It has been designed for use on a computer or server, and has been developed in Python, a programming language that has the advantage of being widely used for machine learning and artificial intelligence.
PyGraft is highly intuitive and generates data that integrates easily with other workflows. As a result, public interest rose as soon as it went online, particularly among specialists in the field of artifical intelligence (AI) or big data, in France or abroad. "We've been contacted by many users and we think some are already using it to generate abstract datasets to enable them to test the machine learning or artificial intelligence methods they are working on, or to check how they behave with larger datasets," explains Pierre Monnin. "Making this library open-source will help us federate a community of contributors and identify emerging needs within the communities of researchers and data scientists who use knowledge graphs."
More good news: the first academic publication on PyGraft has been selected to be presented at one of the most important conferences in the field of the semantic web, the ESWC 2024 conference, to be held in Greece from May 26 to 30, 2024.
Junior Fellow in AI
2004, route des Lucioles
,
06560 Valbonne Sophia Antipolis