Research Software Engineer - Database and Knowledge Representation

EMBL-EBI - European Bioinformatics Institute

Hinxton, United Kingdom

Generative AI has revolutionised the way we interact with knowledge. To benefit from the advances in LLM technology inside Open Targets, we are extending the capabilities of our platform towards LLM integration using open-source frameworks. The project aims to improve on extraction, representation, and usage of scientific knowledge, and present this knowledge to platform users in a user-friendly way. The central aims of the role we describe below will be 1) extending knowledge representation capabilities of the Open Targets platform towards custom knowledge graphs, and 2) interfacing with knowledge extraction and knowledge usage teams to ensure effective knowledge representations for any given task.

Your role

We are seeking a highly skilled and motivated Research Software Engineer with expertise in Python and databases to join the AI knowledge management project for 3 years. We are open to applicants at various career stages, with particular interest in individuals who are eager to utilise cutting-edge technologies to address complex challenges in software development and informatics in the context of drug discovery. This position would be embedded within the Open Targets project team in the Saez-Rodriguez Group at the European Bioinformatics Institute and benefit from joint supervision with Sebastian Lobentanzer in the Saez-Rodriguez Group at Heidelberg University Hospital (UKHD).

You will work collaboratively across the project group with other experts in ML/AI, NLP, data integration and product delivery across ChEMBL, ePMC, Open Targets and Heidelberg University Hospital on a common goal to integrate cutting-edge technology for knowledge extraction, representation and interpretation to help drug discovery scientist. As a crucial member of the project team team, you will design, build, and operate cloud-first software that interfaces with large-scale biomedical data and drug discovery. You will contribute to developing informatics tools designed to support identifying and prioritising drug targets. Leveraging cutting-edge technologies and the expertise of our product owners and industry stakeholders, you will work in a dynamic, multidisciplinary, international environment to tackle a wide range of algorithmic and technical challenges.

As a Research Software Engineer you will be instrumental in extending our Open Targets Platform framework to include a modular knowledge graph platform. Your expertise will enhance the robustness and efficiency of our data processing and knowledge representation systems, contributing directly to our open science initiatives.

As part of a dynamic, collaborative, and international team, you will be responsible for:

Developing and implementing a knowledge graph framework on top of the existing data lake to improve our data sharing and analysis pipelines to assist drug discovery user stories;
Working closely with data provision and analysis engineers up- and downstream of the framework;
Working in an open-source environment, contributing to codebases and collaborating on agile development;
Writing clean, efficient, and readable Python code to support our internal pipelines and integrate Large Language Models.

Actively disseminating the outcomes of the project to the scientific community and stakeholders through well-crafted presentations and publications, community forums and blog.

You have

Advanced degree (MSc, PhD) in computer science, bioinformatics, software development, or a related field;
Strong skills in Python and familiarity with relevant frameworks and tools;
Experience with databases and their Python integrations;
Proficient in open-source development and version control (e.g., Git);
Passionate about collaborative, agile development in a fast-paced environment;
Experience in independent problem-solving and examples of resolving complex issues;
Fluency in written and spoken English;
Ability to effectively communicate ideas or issues and work with team members from multidisciplinary backgrounds.

You might also have

Understanding of the ecosystem of biomedical and/or clinical data resources;
Knowledge of human genetics, genomics, and/or drug discovery - or are interested in learning about these topics;
Experience working with knowledge graphs and graph databases (e.g., Neo4j);
Experience leveraging embeddings derived from graph-based representations and/or machine learning;
Experience building high-quality software and making frequent deployments as part of a regular software release process;
Experience working with infrastructure-as-code, continuous integration, containers, Cloud infrastructure, and deployment;
Interest in promoting your work and the ways we have solved complex challenges.

Apply Now

Don't forget to mention EuroTechJobs when applying.

Share this Job