Responsibilities
- Lead the architecture, deployment, and optimization of scalable ML model serving systems for real-time and batch use cases.
- Collaborate with data scientists, engineers, and stakeholders to operationalize ML models.
- Develop CI/CD pipelines for ML models enabling rapid, safe, and consistent model releases.
- Design, implement, and own comprehensive production monitoring for ML models/systems.
- Manage cloud infrastructure, primarily in AWS or other major public clouds, to support ML workloads.
- Drive best practices in model versioning, observability, reproducibility, and deployment reliability
- Serve in an on-call rotation as a first responder for software owned by your team.
- 5+ years of experience in software engineering, data engineering, or a related field, with at least 3 years focused on MLOps or ML infrastructure.
- Deep hands-on experience with AWS or similar public clouds, including compute, networking, container orchestration, and observability stacks.
- Hands-on experience with CI/CD pipelines, Docker, Kubernetes, and infrastructure-as-code tools (e.g., Terraform, Cloud Formation).
- Proficiency in programming languages like Python, and familiarity with machine learning frameworks (e.g., TensorFlow, PyTorch).
- Solid understanding of ML lifecycle management, including experiment tracking, versioning, and monitoring.
- LLM application development, including prompt engineering and evaluation.
- Strong communication skills for partnering with cross-functional technical and non-technical teams.
Preferred qualifications
- Experience with Ray for inference, or pipeline orchestration
- Hands-on experience with deploying large language models (LLMs) to production.
- Experience with frameworks such as vLLM is a plus.
- Experience with distributed systems and big data technologies (e.g., Spark, Hadoop).
- Experience with event-driven or streaming architectures (e.g., Kafka, Kinesis).
- Knowledge of cloud security, IAM, and compliance best practices for ML workloads.
- Customer Obsessed - Building deep empathy for our customers, putting them at the core of our work, and developing strong, long-term relationships with them.
- Aim High - Always challenging ourselves and others to raise the bar.
- No Ego - Maintaining a "no job too small" attitude, and an open, inclusive and humble style.
- One Team - Taking a highly collaborative approach to achieving success.
- Lift As We Climb - Investing in developing others and helping others around us succeed.
- Lean & Nimble - Working with agility and efficiency to experiment in an often ambiguous environment.
Benefits
- A mission- and values-driven culture and a safe, inclusive environment where you can build, grow and thrive
- A comprehensive total rewards package that supports your wellness and provides security for SimpliSafers and their families (For more information on our total rewards please click here)
- Free SimpliSafe system and professional monitoring for your home.
- Employee Resource Groups (ERGs) that bring people together, give opportunities to network, mentor and develop, and advocate for change.
Tags & focus areas
Used for matching and alerts on DevFound Fulltime Machine Learning Mlops Ai