Responsibilities
- Lead the design, implementation, and operation of ML-focused CI/CD pipelines supporting data ingestion, feature engineering, model training, evaluation, and deployment across dev, test, staging, and production environments.
- Apply and adapt MLOps best practices within existing DevSecOps workflows, including: Data quality checks and schema validation Model validation and promotion gates Model performance and drift monitoring
- Data quality checks and schema validation
- Model validation and promotion gates
- Model performance and drift monitoring
- Architect and oversee training and inference platforms, including experiment tracking, model registries, and automated retraining pipelines.
- Oversee secure integration of Infrastructure-as-Code, containerization, and orchestration (Docker, Kubernetes) for ML and data workloads, including GPU and high-performance compute resources.
- Mentor and guide engineers in MLOps and DevSecOps practices, promoting automation, observability, and security-first design.
- Collaborate with cross-functional teams (data science, software engineering, research, IT, cybersecurity, systems engineering) to ensure ML system reliability, performance, and compliance.
- Lead technical risk assessments and contribute to incident response for ML and data systems (e.g., model degradation, data quality issues, pipeline failures).
- Serve in a hybrid role as both: A senior hands-on engineer contributing to pipelines, infrastructure, and monitoring A technical leader guiding small to mid-sized MLOps initiatives
- A senior hands-on engineer contributing to pipelines, infrastructure, and monitoring
- A technical leader guiding small to mid-sized MLOps initiatives
- Make informed technical decisions across ML, data, security, and operations domains, resolving complex multi-disciplinary challenges.
- Evaluate ethical and operational considerations in AI/ML deployment (e.g., bias, data constraints, mission risk) and recommend appropriate mitigations.
- Stay current on emerging MLOps, AI platform, and data engineering technologies, recommending adoption where beneficial.
Basic qualifications
- U.S. Citizenship
- Active Top Secret clearance or higher
- Bachelor’s degree in Computer Science, Engineering, Data Science, Applied Mathematics, or related field
- 5–9+ years of experience in one or more of the following: MLOps or ML platform engineering DevOps / DevSecOps / SRE supporting data or ML workloads Data engineering with production ML integration Applied machine learning in production environments
- MLOps or ML platform engineering
- DevOps / DevSecOps / SRE supporting data or ML workloads
- Data engineering with production ML integration
- Applied machine learning in production environments
- Strong experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, CircleCI) and modern Git workflows
- Hands-on experience with Infrastructure-as-Code (Terraform, Ansible, CloudFormation) and Kubernetes
- Proficiency with ML and data technologies, including: Python and ML/data libraries (NumPy, pandas, scikit-learn, PyTorch, TensorFlow) Workflow/orchestration tools (Airflow, Kubeflow, Prefect, Dagster) Experiment tracking and model registries (MLflow, Weights & Biases, SageMaker)
- Python and ML/data libraries (NumPy, pandas, scikit-learn, PyTorch, TensorFlow)
- Workflow/orchestration tools (Airflow, Kubeflow, Prefect, Dagster)
- Experiment tracking and model registries (MLflow, Weights & Biases, SageMaker)
- Experience integrating security and governance into ML environments (image/dependency scanning, SBOMs, secrets management, IAM)
- Familiarity with NIST, FedRAMP, and DoD RMF compliance frameworks as applied to ML and data systems
- Strong scripting or programming skills (Python, Bash, Go, or similar)
- Demonstrated experience leading technical efforts and mentoring engineers
- Ability to communicate clearly with both technical and non-technical stakeholders
Preferred qualifications
- Security, cloud, or ML certifications (e.g., CISSP, AWS Security Specialty, AWS ML Specialty, CKS, GIAC)
- Experience implementing Zero Trust architectures
- Experience with observability and monitoring tools (Prometheus, Grafana, ELK/EFK, OpenTelemetry) for ML services
- Hands-on experience with: Feature stores and data validation frameworks (e.g., Great Expectations) Data governance and lineage tooling Policy-as-code for ML environments (OPA, Kyverno, admission controllers)
- Feature stores and data validation frameworks (e.g., Great Expectations)
- Data governance and lineage tooling
- Policy-as-code for ML environments (OPA, Kyverno, admission controllers)
- Prior experience supporting defense, aerospace, or government-secured AI/ML programs
- Experience operating enterprise-scale or mission-critical ML systems, including high-availability inference and rigorous performance monitoring
- Competitive compensation and benefits
- Professional development and tuition assistance
- A collaborative, mission-driven culture
- Direct impact on national security through secure AI/ML solutions
Tags & focus areas
Used for matching and alerts on DevFound Fulltime Machine Learning Mlops Data Engineer Ai