Machine Learning Research Engineer, Etched

Salary not provided
Python
Rust
CUDA
PyTorch
Junior, Mid and Senior level
San Francisco Bay Area

More information about location

Office located in Cupertino, CA

Etched

Designing specialized chips for language model inference

Open for applications

Etched

Designing specialized chips for language model inference

21-100 employees

B2BArtificial IntelligenceManufacturingDeep Tech

Open for applications

Salary not provided
Python
Rust
CUDA
PyTorch
Junior, Mid and Senior level
San Francisco Bay Area

More information about location

Office located in Cupertino, CA

21-100 employees

B2BArtificial IntelligenceManufacturingDeep Tech

Company mission

To create the world’s most powerful servers for transformer inference.

Role

Who you are

  • An ML Research background with interests in HW co-design
  • Experience with Python, Pytorch, and / or JAX
  • Familiarity with transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.) and/or experience working in distributed inference/training environments
  • Experience working cross-functionally in diverse software and hardware organizations

Desirable

  • ML Systems Research and HW Co-design backgrounds
  • Published inference-time compute research and/or efficient ML research
  • Experience with Rust
  • Familiarity with GPU kernels, the CUDA compilation stack and related tools, or other hardware accelerators

What the job involves

  • Propose and conduct novel research to achieve results on Sohu that are unviable on GPUs
  • Translate core mathematical operations from the most popular Transformer-based models into maximally performant instruction sequences for Sohu
  • Develop deep architectural knowledge informing best-in-the-world software performance on Sohu HW, collaborating with HW architects and designers
  • Co-design and finetune emerging model architectures for highest efficiency on Sohu
  • Guide and contribute to the Sohu software stack, performance characterization tools, and runtime abstractions by implementing frontier models using Python and Rust
  • Propose and implement a novel test time compute algorithm that leverages Sohu’s unique capabilities to unlock a product could never be achieved on a typical GPU
  • Implement diffusion models on Sohu to achieve GPU-impossible latencies that allow for real-time image generation
  • Optimize model instructions and scheduling algorithms to optimize for utilization, latency, throughput, and/or a mix of these metrics
  • Implement model-specific inference-time acceleration techniques such as speculative decoding, tree search, KV cache sharing, priority scheduling, etc by interacting with the rest of the inference serving stack

Share this job

View 9 more jobs at Etched

Company

Company benefits

  • Competitive Salary and Equity - We offer a competitive salary and equity package that aligns with industry standards
  • Benefits - Your wellbeing is important to us. We offer unlimited PTO, life insurance, and a 401(k)
  • Full Insurance Coverage - Our insurance includes full medical, dental, and vision with 100% of premium covered

Funding (2 rounds)

Jun 2024

$120m

SERIES A

May 2023

$5.4m

SEED

Total funding: $125.4m

Our take

With AI developing rapidly, the chips that power them are doing so too. The computing requirements of technologies such as Chat GPT will increasingly require customised, hard-wired chips called ASICS, designed specially for the execution of AI models. Whilst GPUs are currently used for AI computing, Etched is making a bet on AI-hardwired chips becoming the predominant powering force of AI.

It’s a gamble, the founders themselves know that if transformers become obsolete, so will their AI-powering product. The SOHU chip that Etched builds differs from GPUs in that it has the AI architecture etched into the chip itself. General Processing Units are “general” and costly, so Etched is betting on specialised transformers taking their place in the near future.

Investors are drawn to Etched’s boldness. The 2022-founded company achieved $120 million in Series A funding just two years into its life. Etched plans to use this to become a main competitor of GPU company Nvidia, and claims to be 20 times faster than the latter’s AI-powering general processor - and significantly less costly too. Time will tell if Etched’s exciting and potentially revolutionary bet on tailored AI chips and transformers pays off.

Steph headshot

Steph

Company Specialist at Welcome to the Jungle