Data Science & Machine Learning Intern, Insitro

Summer 2025

Salary not provided

$35 - $65 per hour

SQL
AWS
GCP
Python
R
Linux
NumPy
Pandas
Scikit-Learn
PyTorch
Git
San Francisco Bay Area
Insitro

Machine learning drug discovery & development

Open for applications

Insitro

Machine learning drug discovery & development

201-500 employees

B2BBiologyMachine Learning

Open for applications

Salary not provided

$35 - $65 per hour

SQL
AWS
GCP
Python
R
Linux
NumPy
Pandas
Scikit-Learn
PyTorch
Git
San Francisco Bay Area

201-500 employees

B2BBiologyMachine Learning

Company mission

To bring better drugs faster to the patients who can benefit most, through machine learning and data at scale.

Role

Who you are

  • Working towards a BS, MS, or Ph.D. in engineering, computational biology, systems biology, computer science, mathematics, statistics, life science, chemistry, physics, or a related field
  • Proficiency in one or more general-purpose programming languages. We primarily use Python
  • Interest in using and developing brand new statistical and machine learning methods inspired by real problems
  • Curiosity about human physiology or disease biology
  • Committed to writing high-quality, well-commented code and documentation
  • Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions
  • Passion for making a difference in the world

Desirable

  • First-hand experience with biological data, preferably using computational approaches
  • Passion for learning how to work with diverse functional genomic assays (RNA/DNase/ATAC/ChIP-seq, etc)
  • Interest in learning how to analyze single-cell RNA-seq data
  • Solid understanding of computational chemistry, including virtual screening (classic QSAR modeling, structure based drug-discovery), library design, etc
  • Demonstrated ability to use and develop cutting edge statistical and machine learning methods inspired by real problems
  • Experience with machine and deep Learning frameworks (e.g., scikit-learn, PyTorch, etc.)
  • Demonstrated ability to write high-quality, production-ready code (readable, well-tested, with well-designed APIs)
  • Experience in Linux environment, database languages (e.g., SQL, No-SQL) and version control practices and tools such as Git
  • Publications of high-quality work in relevant computational biology, bioinformatics, systems biology, life sciences, or biomedical venues, including journals and conferences
  • Passionate about solving problems, asking questions and learning independently
  • Familiarity with the SciPy/PyData ecosystem (numpy, pandas, scipy, dask etc.)
  • Familiarity with cloud computing services (AWS or GCP)
  • Familiarity with statistical analysis software, e.g., R

What the job involves

  • We are looking for highly motivated interns to join the data science & machine learning (DSML) team looking to work at the intersection of machine learning and life sciences for our Summer 2025 cohort
  • You will partner directly with a DSML team mentor in developing and/or applying ML methods to process and analyze large scale datasets from multiple modalities over the course of the summer (11-12 weeks)
  • The DSML diverse team that works across the company spanning imaging, omics, statistical genetics, small molecule discovery, clinical research, and research software engineering
  • Computational Biology:
  • Leverage publicly available single cell transcriptomics resources to extract insights about disease mechanisms relevant to the therapeutic areas
  • Methods for Omics & Imaging data modalities:
  • Develop, productionize, and deploy cutting-edge ML approaches to integrate large-scale multi-modal phenotypic datasets
  • Statistical and Translational Genetics:
  • Develop workflows to enable post-GWAS (Genome-Wide Association Scan) analysis of results, e.g., fine-mapping
  • Translational genetics deep dives: enabling higher throughput annotation and exploration of candidate genes from our discovery efforts
  • Design of statistical methods to improve rare variant burden tests, and methods to improve power in longitudinal phenotypes
  • Integrative Phenotyping
  • Develop ML models for imputing disease-relevant phenotypes from high-content clinical imaging datasets, e.g., MRI, PET-CT
  • Develop ML methods for disentangling and genetically interpreting axes of variation in complex phenotypes
  • Use LLMs to extract disease-relevant information from medical records
  • Molecular Machine Learning:
  • Build rich embedding models using DNA-Encoded Library (DEL) data on the scale of billions of compounds, and use these representations for downstream drug discovery tasks such as hit-discovery
  • Explore generative models of small molecules in various data modalities such as 2D and 3D representations for hit-to-lead drug discovery efforts
  • Develop new geometric deep learning methods to better characterize nuanced molecular properties and relationships
  • Computational Microscopy
  • Identify and prototype novel microscopy-driven phenotyping workflows, including hardware acquisition, post-processing, and featurization
  • Develop robust software tooling to support the deployment of new and existing methods for general use by insitro scientists
  • Optimize existing microscopy acquisition methods in both hardware and software, using ML feature outputs to benchmark improvements
  • In the course of the internship you will learn diverse machine learning techniques and rigorously analyze complex dataset and design metrics to ensure robustness of our methods
  • You can expect to develop and prototype solutions to enable ML based decisions in our workflows
  • You will work closely with machine learning engineers and scientists, biologists, chemists, microscopy experts, and automation engineers
  • You will be mentored by one of our senior researchers, who has significant experience in machine learning and/or data science
  • You will also attend our machine learning team meetings and be exposed to a diverse set of novel technologies and machine learning concepts that tackle various biological questions

Share this job

View 1 more job at Insitro

Insights

Led by a woman
Top investors

28% employee growth in 12 months

Company

Company benefits

  • Complete Healthcare Coverage: Take your health seriously without worrying about the costs. insitro offers comprehensive coverage with low premiums for you and your family
  • Wellness Benefits: Prioritize your well-being beyond medical care. insitro offers a comprehensive range of benefits tailored to support all aspects of wellness
  • Mental Health Support: Our generous healthcare plans include access to digital mental health platforms with a range of services for you and your loved ones
  • Supportive Time Off Policies: Flexible vacation time, generous paid parental and bereavement leaves, and a year-end shutdown week
  • Support for Hybrid Work: Balance work and your personal commitments with in-person, hybrid and remote options based on the team and role. We support you with home office setup reimbursement as well as co-work space stipends for remote workers
  • Formal Learning and Training Programs: Partake in formal programs to help you grow and develop, from day 1 onboarding to new manager training to our annual, in vivo week
  • Self Development: Annual personal budget for books, external training, and conferences that nurture you professional passions and development interests
  • Access to Knowledge Leaders: Engage with in-house and visiting scholars and experts throughout the year at special events designed exclusively for insitrocytes
  • Extensive HQ Benefits: Our insitro HQ offers free daily meals via an onsite cafe and coffee shop, a free fitness facility, an onsite childcare center, and free commuter shuttle buses
  • Social Connection: get to know your colleagues at our weekly HQ happy hours, celebrations for special cultural events, and casual meet-ups for sports, music, outdoor activities, and more

Funding (last 2 of 3 rounds)

Mar 2021

$400m

SERIES C

May 2020

$143m

SERIES B

Total funding: $643m

Our take

The synthetic biology market is rapidly growing, projected to increase from $10.7B in 2021 to over $74.7B by 2031. This growth is fueled by advances in biotechnology and expanding applications in healthcare, agriculture, and energy.

With these rapid advancements, insitro is seizing the opportunity to lead the way. The company is transforming drug discovery by combining machine learning with high-throughput biology. By using AI, insitro aims to overcome the bottlenecks of traditional methods, driving down costs and enabling the development of more effective treatments.

To help it on its mission, insitro recently secured strategic agreements with Eli Lilly for metabolic disease treatments and it also expanded its leadership in AI/ML. With this, the company is surely set to accelerate drug development, promising a future of faster and more effective treatments for patients.

Freddie headshot

Freddie

Company Specialist at Welcome to the Jungle