Job description
Contexte et atouts du poste
This research engineer position takes place within the context of the ANR-SNF MetaboLinkAI project, which aspires to revolutionize the analysis and interpretation of metabolomics data through a multidisciplinary approach that combines a comprehensive knowledge hub (MetaKH) with cutting-edge artificial intelligence (AI) and machine learning (ML) techniques.
The project’s main goals are to enhance the querying and ease of use of metabolomics data, improve research efficiency, and stimulate creativity in the field.
These objectives are set to surpass current standards by creating an encyclopedic and expandable knowledge base, integrating advanced AI to handle the uncertainties of experimental data, and enabling a broader range of hypothesis testing and evaluation.
Within this context, this position will focus on the construction and querying of MetaKH, a decentralized, machine-readable knowledge hub federating and linking pre-existing public knowledge and resources relevant for the use cases of the project (e.g. chemical entities description, biochemical pathways, metabolites information, relevant literature), possibly newly created resources or the semantic lifting of existing resources not available in Semantic Web standards, and and mass spectrometry datasets.
Supervisors: Franck Michel, Catherine Faron, Fabien Gandon (University Côte d'Azur, Inria, CNRS)
Mission confiée
The research engineer will be involved in two major contributions of the 2nd work package: Knowledge representation and management.
First, the research engineer will participate in the creation of a portal and pipeline to support the lifecycle of MetaKH.
Second, the research engineer will take part in the design of a federated query engine capable of querying the distributed knowledge hub, and allowing the service to answer complex, high-level biological questions exploiting decentralized data sources.
In the course of this position, the engineer will collaborate with PhD and postdoc researchers working on the development of AI methods aiming to deal with uncertainty in the data, mine and complement the knowledge hub, and develop an AI research assistant using natural language as an interface to data and knowledge.
Principales activités
Creation of a portal and pipeline to support the lifecycle of MetaKH
The portal must allow users to incrementally integrate, monitor and update reference resources in the knowledge federation (e.g. ChEBI, PubChem, Rhea, SwissLipids, MetaNetX, Pathway Commons, FORUM).
This shall involve multiple tasks:
The development of a domain-specific model to link semantic resources throughout the federation while supporting lack of precision and uncertainty.The development and management of a collection of mappings and links between heterogeneous resources.
Methods for writing those mappings and links shall range from handcrafting to generative AI models.
A git-based life-cycle similar to that of code shall be applied to the produced resources (versioning, issues, publication, continuous integration etc.)The continuous monitoring of the integrated resources (typically to integrate new releases).The deployment and maintenance of self-hosted mirroring of critical resources. All of this shall be achieved within the respect of the FAIR principles.
Design of a federated query engine
Designed as a single data access point hiding the federation's complexity from the users, the query engine will leverage the mappings and links across resources (from the first contribution) to dynamically rewrite and expand SPARQL queries so as to query and integrate the multiple knowledge graphs (KG) at runtime.
This shall involve the construction of an index of the federated KGs, possibly reusing and extending the IndeGx framework [Maillot et al, 2023], and the computation of information relevant for writing federated queries such as KG summaries [Aimonier-Davat et al 2024].
Since the goal is to provide an architecture that is scalable, resource efficient, and sustainable in the long-term, an important aspect in this approach will be the level of mapping expressivity to be considered for a trade-off between runtime efficiency and completeness of the results.
[Maillot et al, 2023] IndeGx: A Model and a Framework for Indexing RDF Knowledge Graphs with SPARQL-based Test Suits.
Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, Franck Michel.
Journal of Web Semantics, 2023.
DOI: ⟨10.1016/j.websem.2023.100775⟩.
⟨hal-03946680⟩
[Aimonier-Davat et al 2024].
FedUP: Querying Large-Scale Federations of SPARQL Endpoints.
Julien Aimonier-Davat, Minh-Hoang Dang, Pascal Molli, Brice Nédelec, Hala Skaf-Molli.
The ACM Web Conference 2024 (WWW ’24), May 2024, Singapore, Singapore.⟨10.1145/3589334.3645704⟩.
⟨hal-04538238⟩
Compétences
The candidate must hold a PhD in Informatics / Computer science and must demonstrate aptitudes or matches with most of the following aspects:
Strong experience with Semantic Web standards and technologiesExperience in distributed data management, querying, crawling, indexing, federating, etc.High motivation for scientific research in an open science contextGood Web development technical skills with knowledge of JavaScript and modern JS frameworks (Node.js, Reactive.js…), REST/RESTful Web services, JSONBackground knowledge and/or experience in life sciences, biology, metabolomicsData science and management expertiseLanguage: excellent English oral and writing skills Other appreciated skills:
Writing skills and motivation for publicationAptitude to work with others and engage in collaborationsAutonomy and initiative, take on technical decisions within the project and justification of choicesRemote working capabilities (emails, collaborative tools, trackers, etc.) Avantages
Subsidized mealsPartial reimbursement of public transport costsLeave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)Possibility of teleworking (after 6 months of employment) and flexible organization of working hoursProfessional equipment available (videoconferencing, loan of computer equipment, etc.)Social, cultural and sports events and activitiesAccess to vocational trainingSocial security coverage Rémunération
From 2692 € gross monthly (according to degree and experience).
Required Skill Profession
Life Scientists