AI Serving Platform Engineer, Modular

$166.5-273k

Salary for candidates based in Canada is $158,000 - $258,000 CAD. Plus annual target bonus and equity

Kubernetes

Golang

Senior and Expert level

San Francisco Bay Area

Remote from Canada, US

More information about location

Fast, scalable Gen AI inference platform

Open for applications

Fast, scalable Gen AI inference platform

201-500 employees

B2BArtificial IntelligenceMachine Learning

Open for applications

$166.5-273k

Salary for candidates based in Canada is $158,000 - $258,000 CAD. Plus annual target bonus and equity

Kubernetes

Golang

Senior and Expert level

San Francisco Bay Area

Remote from Canada, US

More information about location

201-500 employees

B2BArtificial IntelligenceMachine Learning

Company mission

To have real, positive impact in the world by reinventing the way AI technology is developed and deployed into production with a next-generation developer platform.

Job

Company

Role

Who you are

We're seeking engineers who are passionate about pushing the boundaries of distributed inference systems and enjoy working at the intersection of large-scale systems and machine learning
We are looking for candidates based on their breadth and depth of experience in backend engineering, AI inference, and distributed systems development
5+ years of experience working in backend engineering
Experience with kubernetes and operating your own services
Ability to create durable, reusable software tools and libraries that are leveraged across teams and functions
Experience in machine learning technologies and use cases
Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
Strongly identifies with our core company cultural values

Desirable

Experience with high performance computing / networking
Experience working on high scale ML inference infrastructure (traditional AI or genAI)
Familiarity with golang

What the job involves

In the Cloud Inference team, we are focused on building end to end distributed LLM inference deployments that are fully vertically integrated with the MAX stack
Our goal is to make inference both the fastest and most scalable while also building an easiest platform for deploying and scaling models for enterprises and developers alike
Build & ship a LLM focused inference platform using best in class inference techniques (disaggregated inference, multi-node deployment of large models, high performance networking, distributed kv-cache management, high throughput batch processing, etc)
Push the envelope for operational excellence with request-to-kernel observability, multi-cloud deployments, clever autoscaling, cold-start optimizations, and more
Collaborate with our kernels and genAI teams to achieve SOTA application performance by integrating SOTA kernel & serving optimizations with SOTA cluster optimizations
Build helm charts, kubernetes operators, and more to make a create simple, effective, maintainable deployments

Our take

AI development is booming, but the tools behind it? Not so much. Developers are often stuck dealing with messy, disconnected systems that slow everything down and bump up costs.

To address this, Modular has developed a unified AI platform that streamlines the development process. It's AI Engine enhances the performance of models on CPUs and GPUs, supporting popular frameworks like TensorFlow and PyTorch. And on top of that, it's built Mojo, a programming language that's as easy as Python but much faster (35,000 times faster).

In 2023, the company raised an impressive $100M to power its mission. It's been using this funding to grow its team, support more hardware, and push Mojo even further. With the hopes of making AI development faster, smoother, and a whole lot more fun for everyone.

Kirsty

Company Specialist at Welcome to the Jungle

Company

Funding (last 2 of 3 rounds)

Sep 2025

$250m

LATE VC

Aug 2023

$100m

LATE VC

Total funding: $380m

Company benefits

A variety of fantastic health benefits (health, dental, vision insurance; life insurance etc) are available
A 401k plan with up to 5% match
Free tax advice on Carta
Generous work-from-home stipend of $1500 to help you improve your home office
Unlimited paid time off and flexible work hours

Leadership

Chris Lattner

(Co-Founder & CEO)

Founder, Architect, Engineer and BDFL for the LLVM Organization. Former President of Engineering and Product at SiFive, as well as working in various roles at Google, Tesla, and Apple.

Tim Davis

(Co-Founder & President)

Worked at Google for 5 years, ultimately as Group Product Lead for Google ML. Previously founded Fluc Inc, where they developed the entire ML powered delivery system with their co-founder, and mapping software system CrowdSend.

Salary benchmarks

We don't have enough data yet to provide salary benchmarks for this role.

Submit your salary to help other candidates with crowdsourced salary estimates.

Share this job

View 8 more jobs at Modular