Apple
AI

AI Research Scientist - Multimodal Intelligence

Apple · Zürich, ZH, CH

Actively hiring Posted 5 months ago

Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Multifaceted, amazing people and inspiring, innovative technologies are the norm here. The people who work here have reinvented entire industries with all Apple Hardware products. The same passion for innovation that goes into our products also applies to our practices, strengthening our commitment to leave the world better than we found it. Join us in this truly exciting era of Artificial Intelligence to help deliver the next groundbreaking Apple products & experiences! We are continuously advancing the state of the art in Computer Vision and Machine Learning, touching all aspects of language and multimodal foundation models, from data collection, data curation to modeling, evaluation and deployment.

As a member of our dynamic group, you will have the unique and rewarding opportunity to craft upcoming research directions in the field of multimodal foundation models that will inspire future Apple products. You will be working alongside highly accomplished and deeply technical scientists and engineers to develop pioneering solutions for challenging problems. This is a unique opportunity to be part of what forms the future of Apple products that will touch the lives of many people. We (Multimodal Intelligence Team) are looking for an AI Research Scientist to work on the field of Generative AI and multimodal foundation models.

Our team has an established track record of shipping features that leverage multiple sensors, such as FaceID, RoomPlan and hand tracking in VisionPro, as well as a strong research presence in the multimodal AI community. Our publications span multimodal pre-training, vision-language models, video-language models, and multimodal alignment. We are focused on building experiences that demonstrate the power of our sensing hardware as well as large foundation models.

Description

You will work on advancing the capabilities of foundation models and guiding them toward real-world applications in Apple products. This includes researching and developing methods that improve alignment, reasoning, and adaptation of large models to practical use cases, while ensuring they meet Apple’s standards for efficiency, scalability, and privacy. You will focus on creating customized foundation models with targeted capabilities that operate efficiently in constrained environments, supporting the next generation of intelligence across Apple’s ecosystem.

Your work includes staying ahead of emerging research and identifying techniques that are suitable for real-world deployment, helping translate scientific advancements into production-quality solutions. You will design and optimize large-scale data pipelines that support robust training and detailed evaluation of foundation models, working with massive multimodal datasets to push the limits of performance. You will explore new techniques that strengthen focused reasoning, multimodal understanding, and adaptive behavior, enabling models that perform well at large scale while also being tailored for specific Apple experiences, from cloud systems to on-device intelligence.

Collaboration is essential in this role. You will partner with multi-functional teams of engineers and researchers to bring customized and efficient models into Apple products, ensuring smooth integration and enabling intelligent and natural user experiences throughout the ecosystem.

Preferred Qualifications

PhD, or equivalent practical experience, in Computer Science, Machine Learning, or a related technical field.

Demonstrated expertise in related field with publication record in relevant conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV,COLM, etc).

Experience with full stack of foundation model training (vision-language).

Familiarity with large-scale data pipelines, including data curation, preprocessing, and efficient storage.

Ability to work effectively in a multi-functional, collaborative environment.

Experience with advanced reasoning or reinforcement learning methods.

Experience with model distillation using on-policy or off-policy techniques.

Minimum Qualifications

Proficient programming skills in Python and experience with at least one modern deep learning framework (PyTorch, JAX, or TensorFlow).

Experience working with large-scale training pipelines and distributed systems.

MS in Computer Science, Computer Vision, Machine Learning, or related technical field, and a minimum of 6 years relevant experience.

At Apple, we’re not all the same. And that’s our greatest strength. We draw on the differences in who we are, what we’ve experienced, and how we think. Because to create products that serve everyone, we believe in including everyone. Therefore, we are committed to treating all applicants fairly and equally. We will work with applicants to make any reasonable accommodations.

Tags & focus areas

Used for matching and alerts on DevFound
Ai Machine Learning Computer Vision
Common Questions

Frequently asked questions

Quick answers about how DevFound's AI matching, resumes, and referrals work.

DevFound's AI Copilot ingests your profile, goals, and live job data to deliver curated matches in seconds. Every match includes a resume variant, suggested referrals, and interview prep so you can act immediately. The more feedback you provide, the sharper the Copilot becomes.

AI-led job searches shrink the hours spent sifting through boards and formatting resumes. DevFound pairs automation with your personal outreach, so you reserve energy for interviews and negotiation. Traditional networking still matters, but AI gives you a lift before you even send a message.

Modern AI roles expect comfort with production-grade code, data fluency, and practical ML tooling. The strongest candidates pair deep technical chops with storytelling—translating model impact to product, GTM, and exec partners. Continuous learning keeps you ahead as stacks evolve.

DevFound rewards active seekers. Keep your profile fresh, respond to match quality prompts, and enable alerts so you never miss a role. The AI prioritizes companies and teams that align with your feedback, accelerating both introductions and interview invites.

High-density tech hubs continue to host the deepest AI talent pools, yet distributed teams are catching up fast. Use DevFound filters to hone in on onsite, hybrid, or fully remote roles and watch openings expand across time zones.

DevFound aggregates thousands of remote AI openings and flags the nuances—core hours, async culture, and visa needs—up front. The Copilot also recommends how to position your distributed work experience so hiring managers know you can thrive on a remote team.