Crustdata (YC F24)
AI

Founding ML Engineer

Crustdata (YC F24) · San Francisco, CA

Actively hiring Posted 3 months ago

About The Role
Skills: Python, PyTorch, NLP, LLMs, Information Retrieval, Entity Resolution, Text Classification

We're building the gateway to the internet for AI agents. Our APIs already power hundreds of customers — and we went from 0 to $7M ARR in our first 12 months. Now we need someone who can push the boundaries of what our ML systems can do.

We're hiring a Founding ML Engineer to own the research and engineering behind our core intelligence layer. Our platform indexes hundreds of millions of professional profiles and company records from across the web. Making that data searchable, matchable, and enriched is an ML problem at its core.

This is not an MLOps role. You will be researching, training, and shipping models - from paper to prototype to production.

Who you are

  • 3+ years building and shipping ML models in production — NLP, information retrieval, or entity resolution
  • Strong with transformer architectures — you've trained and fine-tuned encoder models, not just called APIs
  • You know how to build and evaluate retrieval systems, classifiers, and embedding models
  • Comfortable with contrastive learning, metric learning, and representation learning
  • Experience using LLMs for structured extraction, classification, or data generation at scale
  • Strong Python and PyTorch
  • A true grinder — we work very hard
  • Founder mentality — someone who wants to be a founder in the future OR was a founder earlier

What you'll be doing

You'll own the ML systems that turn messy, multilingual, web-scale data into structured intelligence. Some example problems:

  • A customer searches for "RevOps professionals" — you need to return people titled "Head of Revenue Department," "Revenue Operations Manager," and "VP Sales Operations," across English, French, and German
  • Three different data sources list what looks like three different companies — but it's actually one. You figure out how to resolve that automatically across millions of records
  • Given raw people data, infer the org chart — who reports to whom, what the team structure looks like, how the engineering org differs from sales
  • Detect what technologies a company uses from unstructured signals scattered across the web
  • Classify whether a job change was a promotion, lateral move, demotion, or just a title edit — and do it for millions of transitions
  • Map raw job titles to canonical titles, seniority levels, and job functions — across dozens of languages and naming conventions

Nice to haves

  • Experience with entity resolution or record linkage at scale
  • Built taxonomy or ontology systems over messy real-world data
  • Background in multilingual NLP or cross-lingual transfer
  • Scaled LLM inference pipelines in production
  • Published research or open-source contributions in NLP/IR
  • Experience with distributed training on GPU clusters

Tags & focus areas

Used for matching and alerts on DevFound
Fulltime Ai Machine Learning Mlops Pytorch
Common Questions

Frequently asked questions

Quick answers about how DevFound's AI matching, resumes, and referrals work.

DevFound's AI Copilot ingests your profile, goals, and live job data to deliver curated matches in seconds. Every match includes a resume variant, suggested referrals, and interview prep so you can act immediately. The more feedback you provide, the sharper the Copilot becomes.

AI-led job searches shrink the hours spent sifting through boards and formatting resumes. DevFound pairs automation with your personal outreach, so you reserve energy for interviews and negotiation. Traditional networking still matters, but AI gives you a lift before you even send a message.

Modern AI roles expect comfort with production-grade code, data fluency, and practical ML tooling. The strongest candidates pair deep technical chops with storytelling—translating model impact to product, GTM, and exec partners. Continuous learning keeps you ahead as stacks evolve.

DevFound rewards active seekers. Keep your profile fresh, respond to match quality prompts, and enable alerts so you never miss a role. The AI prioritizes companies and teams that align with your feedback, accelerating both introductions and interview invites.

High-density tech hubs continue to host the deepest AI talent pools, yet distributed teams are catching up fast. Use DevFound filters to hone in on onsite, hybrid, or fully remote roles and watch openings expand across time zones.

DevFound aggregates thousands of remote AI openings and flags the nuances—core hours, async culture, and visa needs—up front. The Copilot also recommends how to position your distributed work experience so hiring managers know you can thrive on a remote team.