Research Software Engineer - Large Language Model Focus

EMBL-EBI - European Bioinformatics Institute

Hinxton, United Kingdom

Generative AI has revolutionised the way we interact with knowledge. To benefit from the advances in LLM technology inside Open Targets, we are extending the capabilities of our platform towards LLM integration using open-source frameworks. The project aims to improve on extraction, representation, and usage of scientific knowledge, and present this knowledge to platform users in a user-friendly way. The central aims of the role we describe below will be 1) connecting the collected knowledge to an interface that can be queried using natural language, and 2) ensuring that the behaviours of the LLMs are consistent and adequate.

Your role

We are seeking a highly skilled and motivated Research Software Engineer with expertise in Python and Large Language Model (LLM) integration to join the AI knowledge management project for a period of 3 years. This role is central to our mission of advancing open science and robust drug development. You will be a key player in designing and operating software that integrates cutting-edge knowledge graphs, LLMs and ML techniques into our biomedical data processing and drug discovery platforms.

We are open to applicants at various career stages, but we are particularly interested in individuals who are eager to utilise cutting-edge technologies to address complex challenges in biological knowledge extraction. This position would be embedded within the Open Targets project team in the Saez-Rodriguez Group at the European Bioinformatics Institute and benefit from joint supervision with Sebastian Lobentanzer in the Saez-Rodriguez Group at Heidelberg University Hospital (UKHD).

You will work collaboratively across the project group with other experts in ML/AI, NLP, data integration and product delivery across ChEMBL, ePMC, Open Targets and Heidelberg University Hospital on a common goal to integrate cutting-edge technology for knowledge extraction, representation and integration to help drug discovery scientists to generate therapeutic hypotheses.

As a crucial member of the project team, you will design, build, and operate cloud-first open-source software that interfaces with large-scale biomedical data and drug discovery. You will contribute to the development of current and future informatics tools designed to support the identification and prioritisation of drug targets. Leveraging cutting-edge technologies and the expertise of our product owners and industry stakeholders, you will work in a dynamic, multidisciplinary, international environment to tackle a wide range of algorithmic and technical challenges.

As part of a dynamic, collaborative, and international team, you will be responsible for:

Developing and maintaining backend (Python) software, integrating databases and LLMs;
Investigating and implementing open source LLMs/conversational AI agents for use in web applications (chatbot) and therapeutic hypothesis generation;
Accessing bespoke knowledge graphs across OT data for specific use cases and assisting in the integration into the Open Targets Platform ecosystem;
Collaborating with the OT industry and academic partners to collect requirements and assess, prioritise, validate and refine the developed methods;
Taking ownership of the design and development of new features and pipelines, and independent problem-solving to resolve complex issues;
Actively disseminating the outcomes of the project to the scientific community and stakeholders through well-crafted presentations and publications.

You have

Advanced degree (MSc, PhD) in computer science, bioinformatics, software development, or a related field;
Strong proficiency in Python and experience with Large Language Model integration;
Proofed experience in applying modern ML/LLM frameworks and concepts;
Good understanding of ML principles including embeddings, cross-validation and fine-tuning;
Proficiency in common data preprocessing task and normalisation;
Exposure to source code version control software such as Git and GitHub;
Experience in independent problem-solving and examples of resolving complex issues;
Fluency in written and spoken English;
Ability to effectively communicate ideas or issues and work with team members from multidisciplinary backgrounds;
Interest in promoting your work and the ways we have solved complex challenges.

You might also have

Experience in MLOps including experiment tracking and model deployment;
Experience with current LLM frameworks, such as LangChain and open-source LLM deployment (e.g., llama-cpp, ggml, Xorbits Inference);
Knowledge of human genetics, genomics, and/or drug discovery - or are interested in learning about these topics;
Experience building high-quality software and making frequent deployments as part of a regular software release process;
Experience working with knowledge graphs and graph databases;
Previous experience working in the research or life science industries.

Apply Now

Don't forget to mention EuroTechJobs when applying.

Share this Job