Principal PMT-ES - AI/ML Training, Annapurna Labs
Amazon
Responsibilities
Primary Duties
- Training Product Strategy & Roadmap
- Post-Training, RL & Emerging Workflows
- Customer Engagement & Enablement
- Training AI/ML Ecosystem & Delivery
- Launch & Go-to-Market
Experience Requirements
Required
The ideal candidate will have solid understanding of large-scale model training, distributed training architectures, post-training workflows, and reinforcement learning.
Full Job Description
AWS Trainium is deployed at scale, with millions of chips in production, used for training and inference of frontier models. AWS Neuron is the software stack for Trainium, enabling customers to run deep learning and generative AI workloads with optimal performance and cost efficiency. AWS Neuron is hiring a Principal Technical Product Manager to define and drive product strategy for training software on Trainium. This includes distributed training libraries, post-training workflows (RLHF, DPO, fine-tuning), reinforcement learning frameworks, and training performance optimization. Your mission is to enable researchers and operators to train frontier models at scale on Trainium, from single-node experimentation to distributed training across thousands of nodes. You will be the champion inside AWS for frontier model builders pushing the bounds of scale and resilience for current and emerging training paradigms. You will work with customers inside and outside the company to identify key improvements and stay ahead of the training landscape. You will define how Neuron supports the training AI/ML ecosystem and what tools customers will use for their training workflows on Trainium. To be successful, you will partner with engineering teams building training libraries and distributed training infrastructure, applied scientists developing optimization techniques, and PMs responsible for compiler, runtime, NKI, and infrastructure. You will develop deep knowledge of AI/ML training architectures, distributed training systems, model parallelism strategies, and training performance optimization to effectively define product strategy and make informed technical decisions.
The ideal candidate will have:
- Solid understanding of large-scale model training
- Distributed training architectures
- Post-training workflows
- Reinforcement learning
They should be able to assess technical implications of training software stack decisions, understand customer needs, and drive developer experience improvements. The ideal candidate can navigate ambiguity in a fast-moving, early-stage initiative, balance competing priorities across multiple workstreams, and drive alignment across engineering and science stakeholders with excellent written and verbal communication abilities.
Key Job Responsibilities
- Training Product Strategy & Roadmap
- Post-Training, RL & Emerging Workflows
- Customer Engagement & Enablement
- Training AI/ML Ecosystem & Delivery
- Launch & Go-to-Market
About the team: Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge sharing and mentorship. We operate with startup like velocity, prioritizing talent acquisition, hands on leadership, and flexible organization. Our senior members enjoy one on one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.
Diverse Experiences: AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn't followed a traditional path, or includes alternative experiences, don't let it stop you from applying.
Inclusive Team Culture: Here at AWS, it's in our nature to learn and be curious. Our employee led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness.
Work/Life Balance: We value work life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture.
Mentorship & Career Growth: We're continuously raising our performance bar as we strive to become Earth's Best Employer. That's why you'll find endless knowledge sharing, mentorship and other career advancing resources here to help you develop into a better rounded professional.
About Amazon Annapurna Labs: Amazon Annapurna Labs team (our organization within AWS UC) is responsible for building innovation in silicon and software for our AWS customers. We are at the forefront of innovation by combining cloud scale with the world's most talented engineers.
About AWS Utility Computing (UC): AWS Utility Computing (UC) provides product innovations that continue to set AWS's services and features apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cloud computing offerings across the AWS portfolio.
About AWS: Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating, that's why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Company Culture
Work Environment: Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge sharing and mentorship.
How to Apply
Amazon pays $69 for IT Specialist in Cupertino, California, with most salaries ranging from $47 to $103. Pay can vary based on role, experience, and local cost of living.
Companies Similar to Amazon for Jobs
Share This Job
Figures represent approximate ranges and may vary based on experience, location, and other factors. For the most accurate information, please consult the employer directly. Contact us to suggest updates to this information.





