Senior Data Engineer

Save this Job Save

Senior Data Engineer

Ciklum

Remote, Romania

About the role

As a Senior Data Engineer, become a part of a cross-functional development team engineering experiences of tomorrow.

Responsibilities

Responsible for the building, deployment, and maintenance of mission-critical analytics solutions that process terabytes of data quickly at big-data scales;
Contributes design, code, configurations, manage data ingestion, real-time streaming, batch processing, ETL across multiple data storages;
Performs development, QA, and dev-ops roles as needed to ensure total end-to-end responsibility of solutions;
Responsible for performance tuning of complicated SQL queries and Data flows;
Identify gaps and improve the platform’s quality, robustness, maintainability, and speed;
Cross-train other team members on technologies being developed, while also continuously learning new technologies from other team members;
Contribute in the Unit’s activities and community building, participate in conferences, and provide excellence in exercise and best practices.

Requirements

We know that sometimes, you can’t tick every box. We would still love to hear from you if you think you’re a good fit!

5+ years of experience as a Data Engineer;
3+ years of experience coding in SQL, Java/Python/Scala, with solid CS fundamentals including data structure and algorithm design;
2+ years contributing to production deployments of large backend data processing and analysis systems;
1+ years of hands-on implementation experience working with a combination of the following technologies: Hadoop, Map Reduce, Pig, Hive, Impala, Spark, Kafka, Storm, SQL and NoSQL data warehouses such as Hbase and Cassandra;
2+ years of experience in cloud data platforms (AWS, Azure, GCP);
GenAI Understanding:
- Experience working with vector databases (Qdrant, Pinecone, FAISS, Weaviate etc.);
- Knowledge of embedding models and retrieval-augmented generation (RAG) architectures;
- Understanding of LLM pipelines, including data preprocessing for GenAI models.
MLOps Experience:
- Experience deploying data pipelines for AI/ML workloads, ensuring scalability and efficiency;
- Familiarity with model monitoring, feature stores (Feast, Vertex AI Feature Store), and data versioning;
- Experience with CI/CD for ML pipelines (Kubeflow, MLflow, Airflow, SageMaker Pipelines);
- Understanding of real-time streaming for ML model inference (Kafka, Flink, Spark Streaming).
Knowledge of professional software engineering best practices for the full software;
Knowledge of Data Warehousing, design, implementation and optimization;
Knowledge of Data Quality testing, automation and results visualization;
Knowledge of BI reports and dashboards design and implementation (PowerBI, Tableau);
Knowledge of the development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations;
Experience participating in an Agile software development team, e.g. SCRUM;
Experience designing, documenting, and defending designs for key components in large distributed computing systems;
A consistent track record of delivering exceptionally high-quality software on large, complex, cross-functional projects;
Demonstrated ability to learn new technologies quickly and independently;
Ability to handle multiple competing priorities in a fast-paced environment;
Undergraduate degree in Computer Science or Engineering from a top CS program required. Masters preferred;
Experience with supporting data scientists and complex statistical usecases highly desirable.