DeepRec.ai
AI

Senior MLOps Engineer

DeepRec.ai · San Mateo County, CA

Actively hiring Posted 4 months ago

**Senior MLOps / ML Infrastructure Engineer

About the Company**

Our client is a Series B, venture-backed deep-tech company building a Physics AI platform that helps engineering teams bring products to market faster, reduce development risk, and explore better designs with greater confidence. The platform combines large-scale simulation data with modern machine learning to generate high-fidelity predictions of physical behavior in near real time. Customers include leading organizations across aerospace, automotive, and advanced manufacturing, working on some of the most demanding real-world engineering problems.

The Role

This role focuses on building and operating the infrastructure that powers physics-based AI systems at scale. The position enables ML engineers and scientists to train, track, deploy, and monitor models reliably without managing low-level infrastructure. The work sits at the intersection of ML systems, cloud infrastructure, and large-scale simulation data, with a strong emphasis on performance, reliability, and developer productivity. It is a hands-on engineering role in a fast-moving, in-office environment, working closely with ML researchers, platform engineers, and product teams.

What You’ll Do

  • Design, build, and maintain robust MLOps infrastructure supporting the full ML lifecycle, from experimentation and training through to production deployment and monitoring
  • Implement automated training pipelines, experiment tracking, and model lifecycle management using tools such as Kubeflow, MLflow, and Argo Workflows
  • Develop scalable data pipelines capable of handling large volumes of unstructured data, particularly 3D geometric data and physics simulation outputs
  • Deploy machine learning models into production inference systems with strong standards for performance, reliability, and observability
  • Manage model registries and integrate them with CI/CD workflows to support consistent and reliable model releases
  • Implement monitoring systems that continuously track model health and performance in production
  • Collaborate closely with ML researchers, platform engineers, and product teams to evolve the infrastructure platform for physics-based AI applications
  • Write production-grade code and optimize cloud infrastructure, primarily on Google Cloud Platform, while making thoughtful trade-offs around scalability, cost, and operational simplicity using Docker and Kubernetes

What We’re Looking For

  • Bachelor’s degree or higher in Computer Science, Data Science, Applied Mathematics, or a closely related field
  • 5+ years of industry experience building MLOps platforms or ML systems in production environments
  • Strong proficiency in Python, with working knowledge of BASH and SQL
  • Hands-on experience with cloud infrastructure such as GCP, AWS, or Azure
  • Experience with containerization and orchestration tools including Docker and Kubernetes
  • Familiarity with modern MLOps frameworks such as Kubeflow, MLflow, and Argo Workflows
  • Experience building and maintaining scalable data pipelines, ideally working with unstructured or high-dimensional data
  • Ability to independently deploy models and implement monitored inference systems in production
  • Comfortable troubleshooting complex distributed systems and building reliable infrastructure that other teams depend on

Nice to Have

  • Interest in physics simulation, scientific computing, or HPC environments
  • Experience building production MLOps platforms in deep-tech or simulation-heavy environments
  • Familiarity with additional programming languages such as Go or C++

Working Style and Culture

This role suits someone who enjoys startup environments, learns quickly, and communicates clearly across disciplines. The team works on-site five days a week and values close collaboration, fast feedback loops, and hands-on problem solving. There is a strong belief that great infrastructure should be largely invisible, enabling engineers and scientists to move faster without friction.

Tags & focus areas

Used for matching and alerts on DevFound
Fulltime Ai Machine Learning Mlops
Common Questions

Frequently asked questions

Quick answers about how DevFound's AI matching, resumes, and referrals work.

DevFound's AI Copilot ingests your profile, goals, and live job data to deliver curated matches in seconds. Every match includes a resume variant, suggested referrals, and interview prep so you can act immediately. The more feedback you provide, the sharper the Copilot becomes.

AI-led job searches shrink the hours spent sifting through boards and formatting resumes. DevFound pairs automation with your personal outreach, so you reserve energy for interviews and negotiation. Traditional networking still matters, but AI gives you a lift before you even send a message.

Modern AI roles expect comfort with production-grade code, data fluency, and practical ML tooling. The strongest candidates pair deep technical chops with storytelling—translating model impact to product, GTM, and exec partners. Continuous learning keeps you ahead as stacks evolve.

DevFound rewards active seekers. Keep your profile fresh, respond to match quality prompts, and enable alerts so you never miss a role. The AI prioritizes companies and teams that align with your feedback, accelerating both introductions and interview invites.

High-density tech hubs continue to host the deepest AI talent pools, yet distributed teams are catching up fast. Use DevFound filters to hone in on onsite, hybrid, or fully remote roles and watch openings expand across time zones.

DevFound aggregates thousands of remote AI openings and flags the nuances—core hours, async culture, and visa needs—up front. The Copilot also recommends how to position your distributed work experience so hiring managers know you can thrive on a remote team.