Machine Learning Research Engineer, Etched

Salary not provided

Python

Rust

CUDA

PyTorch

Junior, Mid and Senior level

San Francisco Bay Area

More information about location

Office located in Cupertino, CA

Designing specialised chips for language model inference

Job no longer available

Designing specialised chips for language model inference

21-100 employees

B2BArtificial IntelligenceManufacturingDeep Tech

Job no longer available

Salary not provided

Python

Rust

CUDA

PyTorch

Junior, Mid and Senior level

San Francisco Bay Area

More information about location

Office located in Cupertino, CA

21-100 employees

B2BArtificial IntelligenceManufacturingDeep Tech

Company mission

To create the world’s most powerful servers for transformer inference.

Job

Company

Role

Who you are

An ML Research background with interests in HW co-design
Experience with Python, Pytorch, and / or JAX
Familiarity with transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.) and/or experience working in distributed inference/training environments
Experience working cross-functionally in diverse software and hardware organizations

Desirable

ML Systems Research and HW Co-design backgrounds
Published inference-time compute research and/or efficient ML research
Experience with Rust
Familiarity with GPU kernels, the CUDA compilation stack and related tools, or other hardware accelerators

What the job involves

Propose and conduct novel research to achieve results on Sohu that are unviable on GPUs
Translate core mathematical operations from the most popular Transformer-based models into maximally performant instruction sequences for Sohu
Develop deep architectural knowledge informing best-in-the-world software performance on Sohu HW, collaborating with HW architects and designers
Co-design and finetune emerging model architectures for highest efficiency on Sohu
Guide and contribute to the Sohu software stack, performance characterization tools, and runtime abstractions by implementing frontier models using Python and Rust
Propose and implement a novel test time compute algorithm that leverages Sohu’s unique capabilities to unlock a product could never be achieved on a typical GPU
Implement diffusion models on Sohu to achieve GPU-impossible latencies that allow for real-time image generation
Optimize model instructions and scheduling algorithms to optimize for utilization, latency, throughput, and/or a mix of these metrics
Implement model-specific inference-time acceleration techniques such as speculative decoding, tree search, KV cache sharing, priority scheduling, etc by interacting with the rest of the inference serving stack

Our take

With AI developing rapidly, the chips that power them are doing so too. The computing requirements of technologies such as Chat GPT will increasingly require customised, hard-wired chips called ASICS, designed specially for the execution of AI models. Whilst GPUs are currently used for AI computing, Etched is making a bet on AI-hardwired chips becoming the predominant powering force of AI.

It’s a gamble, the founders themselves know that if transformers become obsolete, so will their AI-powering product. The SOHU chip that Etched builds differs from GPUs in that it has the AI architecture etched into the chip itself. General Processing Units are “general” and costly, so Etched is betting on specialised transformers taking their place in the near future.

Investors are drawn to Etched’s boldness. The 2022-founded company achieved $120 million in Series A funding just two years into its life, and received a further $500 million at the start of 2026. Etched plans to use this to amp up its competition with GPU company Nvidia, and claims to be 20 times faster than the latter’s AI-powered general processor - and significantly less costly too. Time will tell if Etched’s exciting and potentially revolutionary bet on tailored AI chips and transformers pays off.

Steph

Company Specialist at Welcome to the Jungle

Company

Funding (last 2 of 3 rounds)

Jan 2026

$500m

LATE VC

Jun 2024

$120m

SERIES A

Total funding: $625.4m

Company benefits

Competitive Salary and Equity - We offer a competitive salary and equity package that aligns with industry standards
Benefits - Your wellbeing is important to us. We offer unlimited PTO, life insurance, and a 401(k)
Full Insurance Coverage - Our insurance includes full medical, dental, and vision with 100% of premium covered

Company HQ

Horace Mann and SJSU, San Jose, CA

Leadership

Robert Wachen

(Co-Founder & President)

Also the Co-Founder of Prod and Mentor Labs, Birthday Cakes 4 Free Maryland and Generate Sales Online.

Gavin Uberti

(Co-Founder & CEO)

Previously Software Engineer at OctoML, Algorithms and BAckend Engineer at Coursedog and was a Software Engineer Intern at Xnor.ai.

Chris Zhu

(Co-Founder)

Was a Teaching Fellow at Harvard and a Software Engineering Intern at Amazon and AvantStay. Was an Analyst at Harvard Undergraduate Capital Partners.

Salary benchmarks

We don't have enough data yet to provide salary benchmarks for this role.

Submit your salary to help other candidates with crowdsourced salary estimates.

Share this job

View 44 more jobs at Etched