Deep Reasoning Labs
AI

Founding ML Platform Engineer

Deep Reasoning Labs · Palo Alto, CA

Actively hiring Posted 4 months ago

Company Description

Deep Reasoning Labs is building a deep reasoning layer for LLMs focused on long-horizon coding. Our iteration loop depends on a fast, reproducible training + evaluation platform: SFT, verifier/PRM training, and RLVR-style post-training, with strong lineage and cost-efficient GPU execution.

Role Description

You will own the internal training + research MLOps platform: scalable PEFT post-training (LoRA/QLoRA), dataset/label pipelines, data acquisition + ingestion, evaluation automation, experiment tracking, and cost-efficient GPU orchestration (including spot/preemptible strategies). You will also own the research inference layer (model serving, batching/caching, version routing) that closes the loop between training, rollouts, and evaluation. This is a “systems + research acceleration” role: your output is that research iterations become reliable, fast, price-efficient, and auditable.

What You'll Work On

Training + post-training pipelines (PEFT-first):

  • Reproducible pipelines for: SFT, verifier/PRM training, and RLVR-style post-training
  • Build repeatable LoRA/QLoRA fine-tuning pipelines (SFT + verifier/PRM training + RLVR-style updates where used), optimized for cost and iteration speed
  • Robust checkpointing/resume and failure handling for long-running jobs
  • Artifact management: dataset versions, configs, checkpoints, eval results, and model registry with lineage

Inference serving + rollout collection (research-grade):

  • Operate an LLM serving stack (e.g., vLLM/SGLang) for policy + verifier/PRM models
  • Optimize throughput/cost via batching, caching, scheduling, and profiling
  • Build reliable rollout collection and replay tooling (configs, model versions, artifacts, traces)

GPU orchestration + cost efficiency:

  • Multi-GPU training reliability (single-node initially; scale up over time)
  • Spot/preemptible strategy: interruption-tolerant training, autoscaling, queueing, capacity-aware scheduling
  • Performance tuning: profiling, dataloading, communication overhead reduction, utilization improvements

Data acquisition + ingestion (training/eval):

  • Build ingestion pipelines for code/text/trace datasets, including programmatic collection from select web sources where appropriate
  • Implement deduping, normalization, provenance tracking, and dataset versioning
  • Ensure operational robustness (rate limiting, retries, incremental crawls, change detection) and practical compliance hygiene (respect access policies/ToS where required)

What Success Looks Like (first ~90 days)

  • One-command reproducible pipeline for baseline SFT + verifier/PRM training + evaluation
  • Spot/preemptible training that is interruption-tolerant (checkpoint/resume) and not babysat
  • Clear dataset + model lineage (you can answer: “what data created this model and what changed?”)
  • Automated eval + regression detection integrated into the iteration loop

Requirements (must-haves)

  • Strong systems + ML infra experience: training pipelines, data systems, reliability engineering
  • Strong data engineering fundamentals: building ingestion pipelines, handling messy sources, deduping, and dataset versioning/provenance.
  • Experience running LLM inference serving (vLLM/SGLang/TGI), including batching/caching and performance tuning.
  • Hands-on experience running multi-GPU training (PyTorch distributed: DDP/FSDP/DeepSpeed/etc.)
  • Strong cloud + IaC skills (AWS/GCP; Terraform/CloudFormation/Pulumi)
  • Track record building reproducible pipelines (artifact/version management, experiment tracking)
  • Performance mindset: profiling, bottleneck identification, cost/perf tradeoffs

Nice-to-have

  • Spot/preemptible fleet orchestration at scale (autoscaling, capacity strategy)
  • RLHF/RLAIF infrastructure (reward models, preference pipelines, rollout collection)
  • LLM serving/inference performance experience (to close the train→serve loop)
  • Experience building reliable crawlers/scrapers and incremental ingestion systems (queueing, rate limits, backoff, change detection).
  • Familiarity with code datasets, build/test tooling, or program analysis signals

Tech Stack (likely)

Linux, Python, PyTorch distributed (DDP/FSDP/DeepSpeed), job orchestration (Kubernetes/ECS/queues), object storage, experiment tracking, IaC, internal eval infrastructure.

Location / Work Model

Remote-first (US/Canada). Strong preference for overlap with Pacific Time. Periodic in-person sprints in SF are a plus.

Compensation

Market-competitive base (location-based) + meaningful founding-level equity.

Tags & focus areas

Used for matching and alerts on DevFound
Fulltime Remote Machine Learning Mlops Ai
Common Questions

Frequently asked questions

Quick answers about how DevFound's AI matching, resumes, and referrals work.

DevFound's AI Copilot ingests your profile, goals, and live job data to deliver curated matches in seconds. Every match includes a resume variant, suggested referrals, and interview prep so you can act immediately. The more feedback you provide, the sharper the Copilot becomes.

AI-led job searches shrink the hours spent sifting through boards and formatting resumes. DevFound pairs automation with your personal outreach, so you reserve energy for interviews and negotiation. Traditional networking still matters, but AI gives you a lift before you even send a message.

Modern AI roles expect comfort with production-grade code, data fluency, and practical ML tooling. The strongest candidates pair deep technical chops with storytelling—translating model impact to product, GTM, and exec partners. Continuous learning keeps you ahead as stacks evolve.

DevFound rewards active seekers. Keep your profile fresh, respond to match quality prompts, and enable alerts so you never miss a role. The AI prioritizes companies and teams that align with your feedback, accelerating both introductions and interview invites.

High-density tech hubs continue to host the deepest AI talent pools, yet distributed teams are catching up fast. Use DevFound filters to hone in on onsite, hybrid, or fully remote roles and watch openings expand across time zones.

DevFound aggregates thousands of remote AI openings and flags the nuances—core hours, async culture, and visa needs—up front. The Copilot also recommends how to position your distributed work experience so hiring managers know you can thrive on a remote team.