Google
AI

Engineering Manager, ML Infrastructure

Google · Sunnyvale, CA, US · $197k - $291k

Actively hiring Posted 4 months ago

Responsibilities

  • Lead our new Workload Optimization (WO) team. Set the technical goal and roadmap and drive its key features in this pivotal role.
  • Collaborate closely with teams across machine learning (ML), and our product area customers to ensure successful execution.
  • Shape the team's culture and processes, identify new opportunities, and translate our broader strategy into concrete priorities and projects.
  • Coach and provide career guidance to your reports, improve our engineering practices, and influence technical direction across the organization.
  • Navigate open-endedness and actively contribute to the team's engineering efforts as a technical leader.

Basic qualifications

  • Bachelor’s degree, or equivalent practical experience.
  • 8 years of experience in software development.
  • 3 years of experience with developing infrastructure, distributed systems or networks, or experience with compute technologies, storage or hardware architecture.
  • 3 years of experience in a technical leadership role.
  • 2 years of experience in a people management or team leadership role.

Preferred qualifications

  • Master's degree or PhD in Computer Science or a related technical field.
  • 3 years of experience working in a matrixed organization.
  • Experience with the end-to-end Machine Learning (ML) development lifecycle and infrastructure.
  • Excellent communication and cross team collaboration skills.

About the company

Like Google's own ambitions, the work of a Software Engineer goes beyond just Search. Software Engineering Managers have not only the technical expertise to take on and provide technical leadership to major projects, but also manage a team of Engineers. You not only optimize your own code but make sure Engineers are able to optimize theirs. As a Software Engineering Manager you manage your project goals, contribute to product strategy and help develop your team. Teams work all across the company, in areas such as information retrieval, artificial intelligence, natural language processing, distributed computing, large-scale system design, networking, security, data compression, user interface design; the list goes on and is growing every day. Operating with scale and speed, our exceptional software engineers are just getting started - and as a manager, you guide the way.

With technical and leadership expertise, you manage engineers across multiple teams and locations, a large product budget and oversee the deployment of large-scale projects across multiple sites internationally.

In this role, you will provide end-to-end, fleet-wide scheduling for all Alphabet Machine Learning (ML) workloads that are efficient, reliable, and easy-to-use. You will be responsible for scheduling work on almost all production machines.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Tags & focus areas

Used for matching and alerts on DevFound
Fulltime Machine Learning Ai
Common Questions

Frequently asked questions

Quick answers about how DevFound's AI matching, resumes, and referrals work.

DevFound's AI Copilot ingests your profile, goals, and live job data to deliver curated matches in seconds. Every match includes a resume variant, suggested referrals, and interview prep so you can act immediately. The more feedback you provide, the sharper the Copilot becomes.

AI-led job searches shrink the hours spent sifting through boards and formatting resumes. DevFound pairs automation with your personal outreach, so you reserve energy for interviews and negotiation. Traditional networking still matters, but AI gives you a lift before you even send a message.

Modern AI roles expect comfort with production-grade code, data fluency, and practical ML tooling. The strongest candidates pair deep technical chops with storytelling—translating model impact to product, GTM, and exec partners. Continuous learning keeps you ahead as stacks evolve.

DevFound rewards active seekers. Keep your profile fresh, respond to match quality prompts, and enable alerts so you never miss a role. The AI prioritizes companies and teams that align with your feedback, accelerating both introductions and interview invites.

High-density tech hubs continue to host the deepest AI talent pools, yet distributed teams are catching up fast. Use DevFound filters to hone in on onsite, hybrid, or fully remote roles and watch openings expand across time zones.

DevFound aggregates thousands of remote AI openings and flags the nuances—core hours, async culture, and visa needs—up front. The Copilot also recommends how to position your distributed work experience so hiring managers know you can thrive on a remote team.