Sage
AI

Senior Machine Learning Engineer

Sage · Newcastle upon Tyne, ENG, GB

Actively hiring Posted 5 months ago

Job Description

We are looking for a Senior ML Engineer to take technical ownership of our machine learning production environment. You will lead the transition of experimental models into production-grade services that are reliable, scalable, and cost-effective. Your mission is to build the "highway" that allows our data science team to deploy models rapidly while ensuring those models are observable and fiscally responsible. You will own the entire ML lifecycle—from automated training pipelines to real-time inference clusters—and serve as a key software engineering contributor to our AI product stack.

This is a hybrid role – three days per week in our Newcastle office.

Key Responsibilities

In this role your key responsibilities will be:

  • Lifecycle & Pipeline Architecture: Design and own the automated "Continuous Training" (CT) and deployment pipelines. Architect reusable, modular infrastructure for model training and serving, ensuring the entire lifecycle is versioned and reproducible.

  • Software Engineering Best Practices: Lead the team in adopting professional engineering standards. This includes owning the strategy for unit/integration testing, peer code reviews, and applying SOLID principles to ML codebases to ensure they remain modular and maintainable.

  • ML Observability: Establish and own the telemetry framework for the AI stack. Implement proactive monitoring for system health and model-specific metrics, such as data drift, concept drift, and prediction accuracy.

  • FinOps & Cost Management: Own the strategy for AI cloud spend. Build monitoring and alerting frameworks to track compute costs (training and inference) and implement optimization strategies like auto-scaling and spot-instance usage.

  • AI Systems Engineering: Act as a lead software engineer to integrate models into the product ecosystem. Develop high-performance, secure APIs and microservices that wrap our ML capabilities for production consumption.

  • Data & Model Governance: Own the versioning strategy for the "Holy Trinity" of ML: code, data, and model artifacts. Ensure clear documentation and audit trails for all production deployments.

What we're looking for:

Essential skills (entry requirements):

  • Demonstrating strong software engineering fundamentals, including production‑quality Python, testing, CI/CD practices, and version control

  • Designing and operating reliable, versioned REST APIs using an API‑first approach

  • Building, deploying, and operating backend services in cloud environments, with AWS as the primary platform (experience on other major clouds considered transferable)

  • Using containerisation and modern deployment approaches, including Docker, automated pipelines, and basic observability

  • Working effectively with real‑world data and production systems in collaboration with product, data, and platform teams

  • Bringing either hands‑on experience delivering machine‑learning systems in production or a very strong software‑engineering background with clear motivation to grow into ML and MLOps

Desirable skills (strong differentiators):

  • Using AWS SageMaker for training, deploying, and operating machine‑learning workloads, or demonstrating equivalent experience on similar cloud ML platforms

  • Exposing machine‑learning models via APIs (e.g. FastAPI‑based inference services) and operating them reliably at scale

  • Applying MLOps practices, including model and version management, monitoring, and handling model or data drift

  • Implementing advanced service patterns such as asynchronous processing, event‑driven architectures, or multi‑version services

  • Serving LLM or GenAI‑based capabilities in production, including model serving, RAG pipelines, and inference controls

  • Designing reusable, platform‑level services and shared ML patterns rather than one‑off implementations

  • Managing cloud operational trade‑offs, including cost efficiency, latency, scalability, and reliability

#LI-MD1

Tags & focus areas

Used for matching and alerts on DevFound
Ai Machine Learning Data Science
Common Questions

Frequently asked questions

Quick answers about how DevFound's AI matching, resumes, and referrals work.

DevFound's AI Copilot ingests your profile, goals, and live job data to deliver curated matches in seconds. Every match includes a resume variant, suggested referrals, and interview prep so you can act immediately. The more feedback you provide, the sharper the Copilot becomes.

AI-led job searches shrink the hours spent sifting through boards and formatting resumes. DevFound pairs automation with your personal outreach, so you reserve energy for interviews and negotiation. Traditional networking still matters, but AI gives you a lift before you even send a message.

Modern AI roles expect comfort with production-grade code, data fluency, and practical ML tooling. The strongest candidates pair deep technical chops with storytelling—translating model impact to product, GTM, and exec partners. Continuous learning keeps you ahead as stacks evolve.

DevFound rewards active seekers. Keep your profile fresh, respond to match quality prompts, and enable alerts so you never miss a role. The AI prioritizes companies and teams that align with your feedback, accelerating both introductions and interview invites.

High-density tech hubs continue to host the deepest AI talent pools, yet distributed teams are catching up fast. Use DevFound filters to hone in on onsite, hybrid, or fully remote roles and watch openings expand across time zones.

DevFound aggregates thousands of remote AI openings and flags the nuances—core hours, async culture, and visa needs—up front. The Copilot also recommends how to position your distributed work experience so hiring managers know you can thrive on a remote team.