AI Engineer - Multimodal Intelligence
Apple
Sunnyvale, California
Posted 1 weeks ago
Qualifications
Education
Master's or equivalent practical experience, in Computer Science, Computer Vision, Machine Learning, or related technical field.
Responsibilities
Primary Duties
- Develop, train, and fine-tune multimodal LLMs across image, video, text, and audio modalities, from data curation through deployment.
- Design and build video/audio encoders, tokenizers, and generative models for multimodal understanding and generation.
- Design and implement agentic AI systems that enable reliable reasoning for natural, proactive, and personalized human interactions.
- Architect end-to-end ML systems that transition from research prototypes to production-grade technologies at scale.
- Collaborate across HW, SW, and ML teams to influence sensor and silicon roadmaps and deliver pioneering on-device experiences.
- Critically evaluate and improve ML codebases, ensuring correctness, efficiency, and maintainable engineering quality.
- Contribute to the team's research direction, identify opportunities for innovation, and help shape product features.
Experience Requirements
Required
3+ years of relevant academic or industry experience in Machine Learning, Computer Vision, or Artificial Intelligence.
3 years of experience
Benefits & Perks
Benefits Package
- Comprehensive medical and dental coverage
- retirement benefits
- a range of discounted products and free services
- reimbursement for certain educational expenses including tuition
Required Skills
Technical Skills
deep learningPythonmodern deep learning framework such as PyTorch or JAXfoundation modelsoptimizationprobabilitylinear algebra
Full Job Description
AI Engineer - Multimodal Intelligence
Are you excited about the amazing potential of foundation models, LLMs, and multimodal LLMs? We are looking for individuals who thrive on collaboration and have a desire to push the boundaries of what is possible today! The VCV org is a centralized applied research and engineering organization responsible for developing real-time on-device Computer Vision and Machine Perception technologies across Apple products. In the Human Intelligence team, we balance research and product to deliver Apple quality, pioneering experiences, innovating through the full stack, and partnering with HW, SW, and ML teams to influence the sensor and silicon roadmap that brings our vision to life. Join us in this truly exciting era of Artificial Intelligence to help deliver the next groundbreaking Apple products & experiences! We are continuously advancing the state of the art in Computer Vision and Machine Learning, touching all aspects of multimodal LLMs, from data collection, data curation to modeling, evaluation and deployment. As a member of our dynamic group, you will have the unique and rewarding opportunity to craft upcoming research directions in the field of multimodal LLMs that will inspire future Apple products.
Responsibilities
Minimum Qualifications
Preferred Qualifications
Pay & Benefits
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $147,400 and $272,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.
Are you excited about the amazing potential of foundation models, LLMs, and multimodal LLMs? We are looking for individuals who thrive on collaboration and have a desire to push the boundaries of what is possible today! The VCV org is a centralized applied research and engineering organization responsible for developing real-time on-device Computer Vision and Machine Perception technologies across Apple products. In the Human Intelligence team, we balance research and product to deliver Apple quality, pioneering experiences, innovating through the full stack, and partnering with HW, SW, and ML teams to influence the sensor and silicon roadmap that brings our vision to life. Join us in this truly exciting era of Artificial Intelligence to help deliver the next groundbreaking Apple products & experiences! We are continuously advancing the state of the art in Computer Vision and Machine Learning, touching all aspects of multimodal LLMs, from data collection, data curation to modeling, evaluation and deployment. As a member of our dynamic group, you will have the unique and rewarding opportunity to craft upcoming research directions in the field of multimodal LLMs that will inspire future Apple products.
Responsibilities
- Develop, train, and fine-tune multimodal LLMs across image, video, text, and audio modalities, from data curation through deployment.
- Design and build video/audio encoders, tokenizers, and generative models for multimodal understanding and generation.
- Design and implement agentic AI systems that enable reliable reasoning for natural, proactive, and personalized human interactions.
- Architect end-to-end ML systems that transition from research prototypes to production-grade technologies at scale.
- Collaborate across HW, SW, and ML teams to influence sensor and silicon roadmaps and deliver pioneering on-device experiences.
- Critically evaluate and improve ML codebases, ensuring correctness, efficiency, and maintainable engineering quality.
- Contribute to the team's research direction, identify opportunities for innovation, and help shape product features.
Minimum Qualifications
- Master's or equivalent practical experience, in Computer Science, Computer Vision, Machine Learning, or related technical field.
- 3+ years of relevant academic or industry experience in Machine Learning, Computer Vision, or Artificial Intelligence.
- Experience in deep learning with demonstrated work in multimodal systems (e.g. vision, language, video, etc.).
- Proficiency in Python and in a modern deep learning framework such as PyTorch or JAX.
- Experience with foundation models (language or multimodal), including training, fine-tuning, and deployment.
- Experience developing, training, and fine-tuning multimodal LLMs.
- Strong foundations in optimization, probability, and linear algebra as applied to machine learning and computer vision.
Preferred Qualifications
- PhD, or equivalent practical experience, in Computer Science, Machine Learning, Computer Vision, or a related technical field with a focus on AI, machine learning, or computer vision.
- Demonstrated expertise in developing, training, and fine-tuning multimodal LLMs at scale and developing industry scale agentic products.
- Proven track record of technical leadership, including architecting complex ML systems and leading projects from conception to product deployment.
- Experience applying foundation models to build autonomous or semi-autonomous agents, including planning, task decomposition, and multi-step reasoning.
- Strong publication record in top-tier venues such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, COLM, etc.
- Experience with large-scale distributed training and model parallelism.
- Strong communication skills and ability to present research findings to both technical and non-technical audiences.
Pay & Benefits
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $147,400 and $272,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.
How to Apply
$238
/ hour
Apple pays $238 for Software Engineer in Sunnyvale, California, with most salaries ranging from $162 to $364. Pay can vary based on role, experience, and local cost of living.
Median
$238
Low
$162
High
$364
Companies Similar to Apple for Jobs
Share This Job
Figures represent approximate ranges and may vary based on experience, location, and other factors. For the most accurate information, please consult the employer directly. Contact us to suggest updates to this information.





