Role overview
- Extract signal from unstructured clinical text.Apply NLP and language model techniques to clinical notes, CCD documents, and other free-text clinical data to generate structured, actionable features for downstream analytics and predictive models.
- Build and fine-tune Small Language Models (SLMs).Design, train, and evaluate domain-specific SLMs tailored to clinical use cases — balancing performance, cost, latency, and compliance requirements.
- UtilizeLLMs where applicable.Leverage large language models where they add clear value (e.g., training data creation, entity extraction, zero-shot classification) while knowing when traditional ML, rules-based approaches, or simpler statistical methods are the right tool for the job.
- Develop predictive analytics solutions.Build and validate predictive models using both classical ML (gradient boosting, logistic regression, survival analysis) and modern deep learning approaches to support clinical decision-making and population health initiatives.
- Conduct rigorous Exploratory Data Analysis (EDA).Deeply explore clinical datasets — structured and unstructured — to uncover patterns, assess data quality, identify feature candidates, and inform modeling strategy before jumping to solutions.
- Communicate findings clearly.Present methodology, results, and recommendations to technical and non-technical stakeholders through well-crafted visualizations, notebooks, and presentations. Translate complex AI/ML concepts into language that clinical and business partners can act on.
- Collaborate across teams.Work with machine learning engineers, data engineers, clinical informaticists, and business partners to ensure clinical data pipelines support AI/ML workflows and that model outputs are integrated into products and decision-making processes.
- Stay current and stay curious.Continuously evaluate emerging techniques in NLP, foundation models, and clinical AI. Bring new ideas to the team, prototype rapidly, and advocate for approaches grounded in evidence rather than hype.
- Uphold data governance standards.Ensure all work complies with HIPAA, data privacy regulations, and internal data stewardship policies, particularly when handling PHI and unstructured clinical text.
Basic qualifications
- 2+ years of experience in data science, machine learning, or applied NLP — preferably in healthcare or a similarly regulated domain.
- Hands-on experience with NLP — text preprocessing, tokenization, named entity recognition (NER), text classification, topic modeling, or similar techniques applied to real-world unstructured data.
- Practical experience with LLMs and/or SLMs — prompt engineering, fine-tuning, RAG architectures, evaluation frameworks, or deploying language models in production or research settings.
- Strong foundation in traditional machine learning — supervised and unsupervised methods, feature engineering, model selection, cross-validation, and performance evaluation.
- Best coding practices – you use version control (Git/Github), commit your work regularly, write clean and reproducible code, and understand that well-organized repositories are as important as well-build models.
- Deep EDA skills — ability to systematically explore datasets, identify data quality issues, surface insights, and make informed decisions about modeling approach before writing a single line of model code.
- Proficiency in Python (pandas, scikit-learn, PyTorch or TensorFlow, Hugging Face Transformers) and SQL for working with large-scale healthcare datasets.
- Experience with cloud-based data and ML platforms, preferably Google Cloud Platform (GCP) — BigQuery, Vertex AI, or equivalent.
- Excellent presentation and communication skills — you can stand in front of a room and clearly explain what you built, why you built it that way, and what it means for the business.
- Judgment and common sense — you understand that not every problem needs an LLM, you meet your deadlines, you ask for help when you're stuck, and you don't over-engineer solutions.
- A genuine curiosity and desire to learn — you read papers, you try new tools, you ask "why," and you're energized by problems you haven't solved before. You know when a rabbit hole is worth diving into and when to pull back, stay focused, and deliver.
Preferred qualifications
- Experience working with clinical text data — clinical notes, discharge summaries, pathology reports, or similar unstructured healthcare documents.
- Knowledge of clinical coding systems and terminologies (ICD-10, SNOMED-CT, LOINC, RxNorm, CPT, NDC, UMLS) and their relevance to NLP pipelines.
- Familiarity with clinical data standards (HL7, FHIR, CCD/C-CDA) and common data models (e.g., OMOP).
- Experience building or contributing to clinical NLP pipelines — entity extraction, relation extraction, negation detection, or section segmentation from clinical narratives.
- Experience with model evaluation in clinical contexts — understanding of sensitivity/specificity tradeoffs, clinical validation, and responsible AI practices in healthcare.
- Familiarity with MLOps practices — model versioning, experiment tracking, CI/CD for ML, model monitoring.
- Experience working directly with clinical stakeholders (physicians, nurses, clinical operation teams, etc) and tailoring presentations, findings, and recommendations to the appropriate audience level – from executive summaries for leadership to detailed methodology reviews for technical notes.
- Privacy, security, and compliance experience: HIPAA/HITRUST, de-identification/tokenization, PHI handling.
- Bachelor’s degree in health informatics, biostatistics, computer science, data science mathematics, biomedical informatics, or related—or an equivalent combination of formal education and experience.
- Master's degree or higher in Health Informatics, Biomedical Informatics, Clinical Informatics, Public Health, Epidemiology, Data Science or a related field is a plus – but not a substitute for demonstrated ability to ship real-world solutions
- Clinical background (RN, PharmD, MD, or similar) with transition into data science or AI is a genuine differentiate for this role.
Tags & focus areas
Used for matching and alerts on DevFound Fulltime Remote Ai Data Science