GenAI Optimization Engineer, Modular

$135-242k

The estimated base salary range for this role to be performed in Canada, regardless of the province, is $127,500.00 - $228,500.00 CAD. Plus annual target bonus and equity for all locations

Python

Tensorflow

C++

CUDA

PyTorch

TUNE

Mid and Senior level

San Francisco Bay Area

Remote from Canada, US

Fast, scalable Gen AI inference platform

Job no longer available

Fast, scalable Gen AI inference platform

201-500 employees

B2BArtificial IntelligenceMachine Learning

Job no longer available

$135-242k

The estimated base salary range for this role to be performed in Canada, regardless of the province, is $127,500.00 - $228,500.00 CAD. Plus annual target bonus and equity for all locations

Python

Tensorflow

C++

CUDA

PyTorch

TUNE

Mid and Senior level

San Francisco Bay Area

Remote from Canada, US

201-500 employees

B2BArtificial IntelligenceMachine Learning

Company mission

To have real, positive impact in the world by reinventing the way AI technology is developed and deployed into production with a next-generation developer platform.

Job

Company

Role

Who you are

In-depth knowledge of the Python programming language
3+ years of working experience in Machine Learning, Deep Learning, or Generative AI
Experience implementing framework-level optimizations for Generative AI use cases
Experience profiling and optimizing GenAI applications
Deep interest in machine learning technologies and use cases
Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture

Desirable

Experience using Machine Learning frameworks like PyTorch, Tensorflow, etc
CUDA/GPU Programming and Optimization experience
Experience with LLVM/MLIR/Compilers
Experience working with distributed/parallel programming models and an understanding of parallel hardware

What the job involves

ML developers today face significant friction in taking trained models into deployment
They work in a highly fragmented space, with incomplete and patchwork solutions that require significant performance tuning and non-generalizable/ model-specific enhancements
At Modular, we are building the next generation AI platform (MAX) that will radically improve the way developers build and deploy AI models
We’re continuously working to improve the performance and scalability of MAX by extending existing features and adding new features for users to try
The E2E Optimizations team is responsible for working cross-functionally across the entire Modular tech stack to implement cutting edge optimizations and research for auto-regressive text generation, image generation, and beyond
Think things like Speculative Decoding, LoRA, Quantization, Chunked Prefill, Distributed Inference, etc
Design, scope, implement, and tune features for Generative AI use cases in the MAX framework
Plan and lead cross-functional projects spanning multiple teams and domains
Collaborate with subject matter experts within Modular to enable features across different parts of the stack
Contribute to the MAX tech stack across multiple languages, including Mojo, Python, and C++
Monitor latest research channels and identify potential opportunities for the MAX framework

Our take

AI development is booming, but the tools behind it? Not so much. Developers are often stuck dealing with messy, disconnected systems that slow everything down and bump up costs.

To address this, Modular has developed a unified AI platform that streamlines the development process. It's AI Engine enhances the performance of models on CPUs and GPUs, supporting popular frameworks like TensorFlow and PyTorch. And on top of that, it's built Mojo, a programming language that's as easy as Python but much faster (35,000 times faster).

In 2023, the company raised an impressive $100M to power its mission. It's been using this funding to grow its team, support more hardware, and push Mojo even further. With the hopes of making AI development faster, smoother, and a whole lot more fun for everyone.

Kirsty

Company Specialist at Welcome to the Jungle

Company

Funding (last 2 of 3 rounds)

Sep 2025

$250m

LATE VC

Aug 2023

$100m

LATE VC

Total funding: $380m

Company benefits

A variety of fantastic health benefits (health, dental, vision insurance; life insurance etc) are available
A 401k plan with up to 5% match
Free tax advice on Carta
Generous work-from-home stipend of $1500 to help you improve your home office
Unlimited paid time off and flexible work hours

Leadership

Chris Lattner

(Co-Founder & CEO)

Founder, Architect, Engineer and BDFL for the LLVM Organization. Former President of Engineering and Product at SiFive, as well as working in various roles at Google, Tesla, and Apple.

Tim Davis

(Co-Founder & President)

Worked at Google for 5 years, ultimately as Group Product Lead for Google ML. Previously founded Fluc Inc, where they developed the entire ML powered delivery system with their co-founder, and mapping software system CrowdSend.

Salary benchmarks

We don't have enough data yet to provide salary benchmarks for this role.

Submit your salary to help other candidates with crowdsourced salary estimates.

Share this job

View 8 more jobs at Modular