GenAI Optimization Engineer, Modular

$135-242k

The estimated base salary range for this role to be performed in Canada, regardless of the province, is $127,500.00 - $228,500.00 CAD. Plus annual target bonus and equity for all locations

Python
Tensorflow
C++
CUDA
PyTorch
TUNE
Mid and Senior level
Remote in Canada, US
San Francisco Bay Area
Modular

AI infrastructure for developers

Job no longer available

Modular

AI infrastructure for developers

201-500 employees

B2BArtificial IntelligenceMachine Learning

Job no longer available

$135-242k

The estimated base salary range for this role to be performed in Canada, regardless of the province, is $127,500.00 - $228,500.00 CAD. Plus annual target bonus and equity for all locations

Python
Tensorflow
C++
CUDA
PyTorch
TUNE
Mid and Senior level
Remote in Canada, US
San Francisco Bay Area

201-500 employees

B2BArtificial IntelligenceMachine Learning

Company mission

To have real, positive impact in the world by reinventing the way AI technology is developed and deployed into production with a next-generation developer platform.

Role

Who you are

  • In-depth knowledge of the Python programming language
  • 3+ years of working experience in Machine Learning, Deep Learning, or Generative AI
  • Experience implementing framework-level optimizations for Generative AI use cases
  • Experience profiling and optimizing GenAI applications
  • Deep interest in machine learning technologies and use cases
  • Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture

Desirable

  • Experience using Machine Learning frameworks like PyTorch, Tensorflow, etc
  • CUDA/GPU Programming and Optimization experience
  • Experience with LLVM/MLIR/Compilers
  • Experience working with distributed/parallel programming models and an understanding of parallel hardware

What the job involves

  • ML developers today face significant friction in taking trained models into deployment
  • They work in a highly fragmented space, with incomplete and patchwork solutions that require significant performance tuning and non-generalizable/ model-specific enhancements
  • At Modular, we are building the next generation AI platform (MAX) that will radically improve the way developers build and deploy AI models
  • We’re continuously working to improve the performance and scalability of MAX by extending existing features and adding new features for users to try
  • The E2E Optimizations team is responsible for working cross-functionally across the entire Modular tech stack to implement cutting edge optimizations and research for auto-regressive text generation, image generation, and beyond
  • Think things like Speculative Decoding, LoRA, Quantization, Chunked Prefill, Distributed Inference, etc
  • Design, scope, implement, and tune features for Generative AI use cases in the MAX framework
  • Plan and lead cross-functional projects spanning multiple teams and domains
  • Collaborate with subject matter experts within Modular to enable features across different parts of the stack
  • Contribute to the MAX tech stack across multiple languages, including Mojo, Python, and C++
  • Monitor latest research channels and identify potential opportunities for the MAX framework

Share this job

View 3 more jobs at Modular

Company

Company benefits

  • Leading medical, dental and vision packages
  • Equity packages
  • Generous maternity & paternity leave
  • 401K Plan
  • Work wherever you want
  • Unlimited Vacation & PTO
  • Corporate perks & epic team fun
  • Great set up

Funding (2 rounds)

Aug 2023

$100m

LATE VC

Jun 2022

$30m

EARLY VC

Total funding: $130m

Our take

Modular has been developing a programming language optimized for developing AI software, to make it easier for developers to complete their AI projects faster and more effectively.

Most of the time, developers will write AI models in Python because of its relatively simple and concise syntax. However, with this simplicity comes performance issues that can slow down programs and code, a problem that Modular hopes to address.

The company's language, named Mojo, combines the usability of Python with the performance of C; which, the company says, brings unparalleled programmability of AI hardware and extensibility of AI models.

With significant funding at its back, the company plans to enhance Mojo moving forward, as well as invest in its other product - a software tool called AI Engine that is designed to make companies’ neural networks faster.

Kirsty headshot

Kirsty

Company Specialist at Welcome to the Jungle