How does DevFound's AI job matching work?

DevFound's AI Copilot ingests your profile, goals, and live job data to deliver curated matches in seconds. Every match includes a resume variant, suggested referrals, and interview prep so you can act immediately. The more feedback you provide, the sharper the Copilot becomes.

Is AI job searching better than traditional methods?

AI-led job searches shrink the hours spent sifting through boards and formatting resumes. DevFound pairs automation with your personal outreach, so you reserve energy for interviews and negotiation. Traditional networking still matters, but AI gives you a lift before you even send a message.

What skills are needed for AI and ML jobs?

Modern AI roles expect comfort with production-grade code, data fluency, and practical ML tooling. The strongest candidates pair deep technical chops with storytelling—translating model impact to product, GTM, and exec partners. Continuous learning keeps you ahead as stacks evolve.

How do I get matched with AI jobs faster?

DevFound rewards active seekers. Keep your profile fresh, respond to match quality prompts, and enable alerts so you never miss a role. The AI prioritizes companies and teams that align with your feedback, accelerating both introductions and interview invites.

What locations have the most AI job opportunities?

High-density tech hubs continue to host the deepest AI talent pools, yet distributed teams are catching up fast. Use DevFound filters to hone in on onsite, hybrid, or fully remote roles and watch openings expand across time zones.

Can DevFound's AI help with remote AI jobs?

DevFound aggregates thousands of remote AI openings and flags the nuances—core hours, async culture, and visa needs—up front. The Copilot also recommends how to position your distributed work experience so hiring managers know you can thrive on a remote team.

Founding ML Platform Engineer Remote at Deep Reasoning Labs

Company Description

Deep Reasoning Labs is building a deep reasoning layer for LLMs focused on long-horizon coding. Our iteration loop depends on a fast, reproducible training + evaluation platform: SFT, verifier/PRM training, and RLVR-style post-training, with strong lineage and cost-efficient GPU execution.

Role Description

You will own the internal training + research MLOps platform: scalable PEFT post-training (LoRA/QLoRA), dataset/label pipelines, data acquisition + ingestion, evaluation automation, experiment tracking, and cost-efficient GPU orchestration (including spot/preemptible strategies). You will also own the research inference layer (model serving, batching/caching, version routing) that closes the loop between training, rollouts, and evaluation. This is a “systems + research acceleration” role: your output is that research iterations become reliable, fast, price-efficient, and auditable.

What You'll Work On

Training + post-training pipelines (PEFT-first):

Reproducible pipelines for: SFT, verifier/PRM training, and RLVR-style post-training
Build repeatable LoRA/QLoRA fine-tuning pipelines (SFT + verifier/PRM training + RLVR-style updates where used), optimized for cost and iteration speed
Robust checkpointing/resume and failure handling for long-running jobs
Artifact management: dataset versions, configs, checkpoints, eval results, and model registry with lineage

Inference serving + rollout collection (research-grade):

Operate an LLM serving stack (e.g., vLLM/SGLang) for policy + verifier/PRM models
Optimize throughput/cost via batching, caching, scheduling, and profiling
Build reliable rollout collection and replay tooling (configs, model versions, artifacts, traces)

GPU orchestration + cost efficiency:

Multi-GPU training reliability (single-node initially; scale up over time)
Spot/preemptible strategy: interruption-tolerant training, autoscaling, queueing, capacity-aware scheduling
Performance tuning: profiling, dataloading, communication overhead reduction, utilization improvements

Data acquisition + ingestion (training/eval):

Build ingestion pipelines for code/text/trace datasets, including programmatic collection from select web sources where appropriate
Implement deduping, normalization, provenance tracking, and dataset versioning
Ensure operational robustness (rate limiting, retries, incremental crawls, change detection) and practical compliance hygiene (respect access policies/ToS where required)

What Success Looks Like (first ~90 days)

One-command reproducible pipeline for baseline SFT + verifier/PRM training + evaluation
Spot/preemptible training that is interruption-tolerant (checkpoint/resume) and not babysat
Clear dataset + model lineage (you can answer: “what data created this model and what changed?”)
Automated eval + regression detection integrated into the iteration loop

Requirements (must-haves)

Strong systems + ML infra experience: training pipelines, data systems, reliability engineering
Strong data engineering fundamentals: building ingestion pipelines, handling messy sources, deduping, and dataset versioning/provenance.
Experience running LLM inference serving (vLLM/SGLang/TGI), including batching/caching and performance tuning.
Hands-on experience running multi-GPU training (PyTorch distributed: DDP/FSDP/DeepSpeed/etc.)
Strong cloud + IaC skills (AWS/GCP; Terraform/CloudFormation/Pulumi)
Track record building reproducible pipelines (artifact/version management, experiment tracking)
Performance mindset: profiling, bottleneck identification, cost/perf tradeoffs

Nice-to-have

Spot/preemptible fleet orchestration at scale (autoscaling, capacity strategy)
RLHF/RLAIF infrastructure (reward models, preference pipelines, rollout collection)
LLM serving/inference performance experience (to close the train→serve loop)
Experience building reliable crawlers/scrapers and incremental ingestion systems (queueing, rate limits, backoff, change detection).
Familiarity with code datasets, build/test tooling, or program analysis signals

Tech Stack (likely)

Linux, Python, PyTorch distributed (DDP/FSDP/DeepSpeed), job orchestration (Kubernetes/ECS/queues), object storage, experiment tracking, IaC, internal eval infrastructure.

Location / Work Model

Remote-first (US/Canada). Strong preference for overlap with Pacific Time. Periodic in-person sprints in SF are a plus.

Compensation

Market-competitive base (location-based) + meaningful founding-level equity.

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Remote Machine Learning Mlops Ai