S
AI

Computer Vision Engineer

Stealth Mode Startup · Los Angeles, CA · $12k

Actively hiring Posted 3 months ago

Computer Vision Engineer — Pose Estimation & Scene Understanding

Location: San Francisco Bay Area or Los Angeles, California
Work Style: Hybrid, with regular in-person collaboration

About Us
We are a funded, early-stage technology company building a real-time simulation platform at the intersection of multi-agent AI, machine learning, and large-scale real-world data. Our approach combines deep learning, real-time systems, and advanced modeling to simulate complex, dynamic environments from the ground up.

We're starting with foundational simulation and data infrastructure before scaling fidelity and intelligence layers. The team is small, backed by experienced investors, and led by repeat founders with multiple exits. If you want to build something that hasn't been built before, we'd like to talk.

The Role
We're building AI-powered game simulations driven by large-scale real-world data. Our pipeline detects players, estimates 3D pose from monocular video, and maps that data onto a 3D game positioning and model. We're looking for a CV engineer to own and improve this entire upstream extraction pipeline.

What you'll do
You'll own the pipeline that turns raw basketball broadcast footage into clean, accurate 3D pose and trajectory data. That means fine-tuning and improving our HMR 2.0-based pose estimation, optimizing player detection and tracking, building or training specialized models for court keypoint detection, and refining our homography estimation to accurately place 3D pose data onto the court. You'll be responsible for the accuracy and reliability of every piece of training data that feeds our downstream diffusion models. When extraction quality degrades—occlusions, camera cuts, unusual angles—you'll diagnose the problem and fix it, whether that means fine-tuning an existing model, training a new one, or engineering around the issue.

What we're looking for

Strong experience with human pose estimation—ideally you've fine-tuned models like HMR, CLIFF, or similar mesh recovery architectures. Experience with object detection and tracking in video—YOLO, ByteTrack, or similar.
Solid understanding of camera geometry: homographies, camera calibration, projection between 2D and 3D coordinate systems.
Ability to identify when an off-the-shelf model isn't cutting it and train a specialized model to fill the gap—for example, a court keypoint detector trained on sports broadcast data.
Comfort working with messy real-world video data—broadcast footage is full of overlays, replays, camera cuts, and occlusions.
Strong PyTorch skills and experience training and evaluating vision models.

Nice to have

Experience with sports video analysis specifically.
Familiarity with multi-view geometry or structure from motion.
Experience with temporal pose tracking and smoothing across video sequences.
Background in dataset curation—knowing what clean training data looks like and how to get there.

Why this role matters
Everything downstream depends on the quality of extracted motion data. Our diffusion models learn from it. Our rendered simulations are driven by it. If the pose data is noisy, occluded, or misaligned, nothing else works. You own the foundation of the entire pipeline.

Additional Information
We're a small, brilliant team working in stealth on an ambitious project. We value in-person collaboration and are building something meaningful and one of a kind.
Compensation will depend on experience and contract structure. Candidates must have authorization to work in their respective location. This is a part-time contract to hire a position.

We are an equal opportunity employer and are committed to fostering a diverse, inclusive workplace.

Show more

Show less

Seniority level

Internship

Employment type

Full-time

Job function

Engineering and Information Technology

Industries

Software Development

Tags & focus areas

Used for matching and alerts on DevFound
Computer Vision Ai
Common Questions

Frequently asked questions

Quick answers about how DevFound's AI matching, resumes, and referrals work.

DevFound's AI Copilot ingests your profile, goals, and live job data to deliver curated matches in seconds. Every match includes a resume variant, suggested referrals, and interview prep so you can act immediately. The more feedback you provide, the sharper the Copilot becomes.

AI-led job searches shrink the hours spent sifting through boards and formatting resumes. DevFound pairs automation with your personal outreach, so you reserve energy for interviews and negotiation. Traditional networking still matters, but AI gives you a lift before you even send a message.

Modern AI roles expect comfort with production-grade code, data fluency, and practical ML tooling. The strongest candidates pair deep technical chops with storytelling—translating model impact to product, GTM, and exec partners. Continuous learning keeps you ahead as stacks evolve.

DevFound rewards active seekers. Keep your profile fresh, respond to match quality prompts, and enable alerts so you never miss a role. The AI prioritizes companies and teams that align with your feedback, accelerating both introductions and interview invites.

High-density tech hubs continue to host the deepest AI talent pools, yet distributed teams are catching up fast. Use DevFound filters to hone in on onsite, hybrid, or fully remote roles and watch openings expand across time zones.

DevFound aggregates thousands of remote AI openings and flags the nuances—core hours, async culture, and visa needs—up front. The Copilot also recommends how to position your distributed work experience so hiring managers know you can thrive on a remote team.