Uni Internship May to Oct 2025 - Development of Small Language Models (SLMs) for healthcare use-case
Date: 7 Jan 2025
Location: SG
Company: Synapxe
Synapxe is the national HealthTech agency inspiring tomorrow’s health. The nexus of HealthTech, we connect people and systems to power a healthier Singapore. Together with partners, we create intelligent technological solutions to improve the health of millions of people every day, everywhere.
Are you someone who enjoys problem solving, has a creative and curious mind, and strives to create a better and healthier tomorrow? If you say yes to all, do check out our website and find out more about Internship@Synapxe.
Join Synapxe as an intern and see how you can contribute in powering a healthier Singapore. We aim to deliver the best experience for all interns, to create exponential growth and paving your future in the tech industry.
While remarkable progress has been made in Natural Language Processing (NLP), the full potential of Language models in healthcare remains untapped. LLMs such as GPT4 or Llama2 have demonstrated their efficacy in various NLP tasks across industries, but their adoption in the healthcare domain has been limited due to the scarcity of specialized medical data for training, very large sizes, and high costs and resource demand. SLMs provide cost-effective alternatives to adapt language models to specialized tasks. The project will involve developing, fine-tuning, and evaluating SLMs for a healthcare use-case.
The objective of the project is to implement end-to-end pipelines to harness the power of open-source SLMs for healthcare use-cases as proof-of-concept (POC). Some possible use-cases include Classification, Summarization, and Question answering. The pipelines may include some of the following steps: data preparation, prompt engineering, model fine-tuning, model evaluation.
The candidate will work on some of the following tasks:
- Publicly available data will be used for the project
- Apply text processing and data exploration techniques to get acquainted with the dataset and create a data processing pipeline to prepare the data in a suitable format for model ingestion
- Implement prompt engineering, Retrieval Augmented Generation (RAG) and/or fine-tuning techniques to adapt SLMs to healthcare-specific NLP tasks
- At the end of the project the candidate will be familiarized with basic and advanced NLP techniques as well as state-of-the-art SLM implementation and development for specific use-cases
Note: The scope of the project may change depending on company priorities. In addition, the student may be asked to contribute and support additional ongoing projects and duties on a demand basis.
About you:
- Be pursuing a Bachelor Degree in Business Analytics, Computer Science, Computer Engineering, Data Science or related discipline
- Graduating in Dec 2025 or May/Dec 2026
- Strong coding skills in Python programming language for data processing and model development is required
- Experience with frameworks for NLP and Language model integration, such as LangChain, LlamaIndex, and Hugging Face is preferred.
- Experience with one or more deep learning frameworks, e.g., Tensorflow, Torch is preferred
- Familiarity with git, github repositories and object-oriented programming is desirable
- Familiarity with cloud service platforms such as AWS, Azure, and GCP is a plus
- Ability to document comprehensively and rigorously internship project materials, including literature articles, code, results, findings, and slides, for a proper handover to supervisor and teammates
- Ability to communicate effectively and present results and findings
- Ability to multitask and work effectively as part of a multidisciplinary team
- Passionate and keen to make a difference to re-imagine the future of HealthTech
#LI-YG1
#LI-LK1