XPENG
AI

Research Scientist Intern

XPENG · Santa Clara, CA

Actively hiring Posted 6 months ago

Role overview

We are actively seeking a full-time
Research Scientist Intern
to drive the modeling and algorithmic development of XPENG’s next-generation
Vision-Language-Action (VLA) Foundation Model
— the core brain that powers our end-to-end autonomous driving systems.

You will work closely with world-class researchers, perception and planning engineers, and infrastructure experts to design, train, and deploy large-scale multi-modal models that unify vision, language, and control. Your work will directly shape the intelligence that enables XPENG’s future L3/L4 autonomous driving products.

Responsibilities

  • Conduct research on designing and implementing large-scale multi-modal architectures (e.g., vision–language–action transformers) for end-to-end autonomous driving.
  • Design and integrate cross-modal alignment (e.g., visual grounding, temporal reasoning, policy distillation, imitation and reinforcement learning) to improve model interpretability and action quality.
  • Closely collaborate with researchers and engineers across the modeling and infrastructure team.
  • Contribute to top-tier AI/CV/ML conferences publications and present research findings.

Basic qualifications

  • Currently enrolled in the Master/Ph.D program in Computer Science, Electrical/Computer Engineering, or related field , with the specialization in the CV/NLP/ML.
  • Experience in multi-modal modeling (vision, language, or planning), with deep understanding of representation learning, temporal modeling , and reinforcement learning techniques.
  • Strong proficiency in PyTorch and modern transformer-based model design.

Preferred qualifications

  • Publication record in top-tier AI conferences (CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, etc).
  • Prior experience building foundation or end-to-end driving models , or LLM/VLM architectures (e.g., ViT, Flamingo, BEVFormer, RT-2, or GRPO-style policies).
  • Knowledge of RLHF/DPO/GRPO , trajectory prediction , or policy learning for control tasks.
  • Familiarity with distributed training (DDP, FSDP) and large-batch optimization.

Benefits

  • A collaborative, research-driven environment with access to massive real-world data and industry-scale compute .
  • Opportunity to work with top-tier researchers and engineers advancing the frontier of foundation models for autonomous driving.
  • Direct impact on the next generation of intelligent mobility systems .
  • Comprehensive benefits, meals, and team-building activities.
  • A fun, supportive and engaging environment
  • Infrastructures and computational resources to support your work.
  • Opportunity to work on cutting edge technologies with the top talents in the field.
  • Opportunity to make significant impact on the transportation revolution by the means of advancing autonomous driving
  • Competitive compensation package
  • Snacks, lunches, dinners, and fun activities

Tags & focus areas

Used for matching and alerts on DevFound
Internship Ai Machine Learning Robotics