1-2+ years experience in site reliability engineering, DevOps, and system administration
CS (preferred) or other technical degree, or equivalent practical experience
Experience working with DevOps tools such as Kubernetes, Terraform, Ansible, Puppet and Chef
Proficiency with scripting languages such as Python and bash
Experience managing cloud infrastructure in AWS, GCP, and/or Azure
Expertise in Linux administration, configuration, and networking protocols
Desirable
Experience with on-prem deployment architectures
Experience running a 24x7 SaaS platform with defined SLI, SLO, SLA
Familiarity with operating machine learning & AI applications
What the job involves
Work hands-on with the infrastructure that supports our distributed & highly scalable services in both SaaS and on-prem offerings
Gather requirements from customers and adapt manifests and software to support new environments
Use and augment monitoring tools to observe platform health, ensure performance and reliability
Interact with the product team to test new features and package new on-prem releases
Automate and optimize the release pipeline to make it as frictionless as possible
Exhibit continuous curiosity for emerging technology that could solve our challenges
Our take
Machine learning is rapidly becoming a fundamental and mainstream automation tool used across many industries. The issue, however, is that monitoring, assessing, and improving ML models is costly, and especially tricky without monitoring capabilities. It’s created a market demand for ML monitoring tools, and Arize’s platform offers a rich suite of them.
The Arize platform gives developers and operators comprehensive oversight of their models with tools including performance tracing, drift detection, and integration and deployment. This provides a crucial guardrail: as more decision-making is handed over to ML in areas like insurance and loans, poor ML model performance can have a catastrophic effect on both individuals and customers alike.
Remarkably, given this development, ML observability platforms aren’t yet in huge supply. Arize startup rivals like Censius and WhyLabs are making strong headway, but a clear winner has by no means yet emerged. Arize has, however, netted some major customers: Spotify, Instacart, Uber, and P&G are a handful of the major clients that will help it lay a strong foundation in what is anticipated to be a highly lucrative market.
Kirsty
Company Specialist at Welcome to the Jungle
Meet the team
Meet Andy, Product Design Team Lead
Our mission
Collaborate to create functional and innovative design solutions.
Do > Say - We believe in building, not just talking. Execution matters more than meetings, and progress beats process. Instead of endless debate, we prototype. Instead of bureaucracy, we ship. AI moves fast—so do we.
N+1 - Good enough isn’t good enough. We believe in continuous improvement—whether it’s improving AI-powered systems, our product, or ourselves, there’s always a better version to build, iterate, and level up.
Bring Solutions - Spot a problem? Own it. We don’t waste time on complaints or excuses—we focus on fixing, improving, and building. Whether it’s a blocker in a project or a gap in how we work, the expectation is simple: push for a better way forward.
Has held Software Engineer roles at both Dote and Apple, and worked on Machine Learning at TubeMogul and Uber. Also co-founded and served as CEO at Monitor ML, and is additionally a columnist at Forbes.