Data Engineer - Hadoop (MOH)

Date: 6 Apr 2024

Location: SG

Company: Synapxe

At TRUST, we believe in the power of data to transform health and healthcare research. We provide the platform to enable access to anonymised real-world datasets to support health data analytics and innovation.

We are always looking for talented individuals who share this passion. Whether you are a data scientist, software engineer, or biomedical professional, we have opportunities for you to make an impact and grow your career.

As a member of our team, you will have the opportunity to work with new and emerging technology, and contribute to positive impacts for society. You will be part of a collaborative and dynamic work environment that values creativity, innovation, and diversity.

We are committed to supporting our employees’ growth and development through ongoing learning opportunities, mentorship, and career advancement programmes. We believe in fostering a culture of continuous improvement and are dedicated to investing in our people.

Data Engineer

Job Role Description

The Data Engineer supports the implementation of data structure and architecture, master/meta-data management approach and data quality programme to facilitate access to data and information. He/she support the design, implementation and maintenance of data flow channels and data processing systems that support the collection, storage, batch and real-time processing, and analysis of information from structured and unstructured sources in a scalable, repeatable, and secure manner. He/she implements data management standards and practices.


(a) Work with stakeholders to understand needs for data structure, availability, scalability, and accessibility

(b) Support translation of data business needs into technical system requirements 

(c) Identify opportunities for improvements and optimisation 

(d) Build data flow channels and processing systems to extract, transform, load, and integrate data from various sources 

(e) Develop complex code, scripts, and data pipelines to process structured and unstructured data

(f) Work with data analysts, data scientists and other analytics stakeholders to implement data models to support analytics use cases 

(g) Assist with integration of data systems into existing infrastructure 

(h) Develop tools to improve data flows between internal/external systems and the data lake/warehouse 

(i) Able to administer, design, develop, validate, deploy, and maintain ETL tool such as Informatica, IBM DataStage, Talend

Requirements / Qualifications

(a) Degree/master’s in computer science, Information Technology, Computer Engineering or equivalent.

(b) At least seven (7) years of experience in providing data warehouse or advanced analytics solutions, especially in designing large Big Data and cloud technologies.

(c) Demonstrate good, in-depth knowledge in relevant Extract-Transform-Load (ETL) hardware/software products, frameworks, and methodologies.

(d) Experience with at least two of the following areas:

    ·Databases (e.g., Oracle, MS SQL, MySQL, Teradata)

    ·Big data (e.g., Hadoop ecosystem)

    ·Cloud (e.g., AWS cloud native tools)

    ·ETL development using ETL tools (e.g., Informatica, IBM DataStage, Talend)

    ·Data repository design (e.g., operational data stores, dimensional data stores, data marts)

    ·Data interrogation techniques (e.g., SQL, NoSQL).

    ·Structured and unstructured data analytics.

    ·Batch and real-time data ingestion and processing

    ·Data quality tools and processes.

    ·Data transformation and terminology equivalence mapping.

(e) Experience in data modelling for analytics (e.g., star schemas, snowflake schemas).

(f) Experience with data acquisition tools (e.g., ETL, real-time data capture, and change data capture).

(g) Experience in interacting with analytics stakeholders (economists, statisticians, clinicians, policy makers) on a business or domain level

(h) Deep understanding of analytical models and methodologies – especially in the context of health analytics for clinical use and clinical safety (e.g., data mining, predictive analytics) would be preferred.

(i) Comfortable working independently to carry out data analysis, estimate data quality and sufficiency. Understanding and analysing huge volumes of data drawn from heterogeneous sources / repositories. Experience in working with Big Data/ Cloud technologies / solutions would be preferred.  

(j) Good interpersonal skills, a detail-oriented & flexible person who can work across different areas within the team.

(k) A good understanding of Singapore Healthcare System would be preferred.

(l) Familiarity or experience with health informatics would be preferred.

(m) An understanding of healthcare data governance, data acquisition and data management would be an advantage.