Full-time scientist at Finch Therapeutics

Finch June 2018 - present
Manager: Rotem Gura Sadovsky, PhD
Team Lead: Sonia Timberlake, PhD

I began in my role as a Bioinformatics Research Associate on Finch Therapeutics’ Data Science team in June of 2018. As part of the Data Science team, I work in an extensively collaborative environment, combining insights from my teammates, bench scientists and clinicians at Finch to perform high-throughput analysis of multi-omics microbiome data from patients in a variety of disease indications. My role, which was expanded with a promotion Bioinformatics Research Associate II in June of 2020, also means that I assist in computational analysis method development, perform end-to-end implementation and validation of bioinformatics pipelines, and orchestrate meta-analysis projects as part of the Ulcerative Colitis, Crohn’s Disease, Autism Spectrum Disorder and Clostridium difficile (CP101) programs.

Analyzing the human microbiome in IBD

Ulcerative colitis (UC) and Crohn’s Disease (CD), collectively form the class of heterogeneous gastrointestinal inflammatory disorders known as inflammatory bowel disease (IBD). As part of the microbial candidate discovery process for in Finch’s Rationally Selected Microbiota therapy in IBD, I have processed NGS data from dozens of datasets constituting thousands of samples and billions of reads for meta-analysis. This is involved building and validating version-controlled and scalable bioinformatics pipelines with in an HPC environment to quality-filter DNA sequences, remove contaminant DNA, and quantify strain abundances using both public and custom databases. After processing was complete, I identified microbes which strongly distinguished IBD and non-IBD cohorts with a robust statistical meta-analysis framework. Using clinical diagnosis and demographic information from the patients in each dataset, I also discovered disease location and behavior-specific microbial populations, presenting this information in the phylogenetic and functional context of the microbial populations of the gut. In addition to building outstanding technical skills and experience working with high volumes of real-world data under tight deadlines, these projects require that I work independently while effectively communicating my decisions and hypotheses to align my work with each project’s timelines. As I assessed data quality, consulted literature for best practices in pipeline construction, and executed on multiple complex projects, I gained critical insights into how to manage scientific projects and generate actionable outcomes.

Computational and scientific skills

With a small group of users and a large number of use cases for Finch’s HPC resources, my team and I are the managers and developers of our own infrastructure. As a result, I have become responsible for version-controlled management of multiple pipelines using both Amazon Web Services and Google Cloud Platform architecture. I have spent thousands of hours developing code in Python and bash in Linux server environments, as well as constructing pipelines using HPC-specific languages like WDL and SLURM. These technical skills have allowed me to find and generate the large volumes of NGS data that I need to answer complex biological questions in heterogenenous patient populations. As have I examined and manipulated NGS sequence data from bacterial and non-bacterial organisms, I have also developed an understanding of the features and caveats of each data type, and how to ask questions with the specific limitations of these data in mind. Gaining this volume of technical expertise has been one of the most valuable components of my work at Finch. As a result of my experience at Finch, I am extremely experienced with common data manipulation and visualization libraries like pandas, numpy, seaborn, scipy, and BioPython. At Finch I have worked with hundreds of pieces of software, including but not limited to QIIME, DADA2, mothur, the bioBakery ‘omics tools, Robert C. Edgar’s tools and their derivatives, bowtie2, KMA, and other mapping software, WDL/Cromwell, phylogenetic manipulation software and packages like SEPP and ete, statistical analysis frameworks and more.

Other projects

In addition to the IBD and computational platform development projects, I participate in a variety of projects for optimizing experimental protocols, exploring strain diversity in human patients to identify potential sources for isolation, and investigating new ways of intelligently characterizing the gut microbiome’s interconnected biology. I’ve helped optimize experimental procedures by assessing compositions of mock and actual microbial samples under a variety of experimental protocols. To identify and help isolate candidate strains, I have assessed the specificity of 16S rRNA sequencing regions for strains of interest, identified samples where these strains are abundant, and analyzed data to identify which media conditions aid microbial growth in isolation attempts. The wide variety of projects that I have participated in at Finch has broadened my understanding of how to design protocols and ask questions about intelligent experimental and assay design.

Carleton College

2014 - 2018
Bachelor of Arts in Biology
Labs: Dr. Anderson, Dr. Oesper

My education at Carleton College involved a multidisciplinary combination of biology, chemistry, and computer science. In addition to research with professors in the Computer Science and Biology departments, I was active in the Biology department as a Student Departmental Advisor, tutor, and Biology Stockroom Assistant.

Anderson lab:

May 2017 - June 2018
PI: Professor Rika Anderson
Dr. Anderson’s studies the co-evolution of microbes and viruses in Earth’s oceans. My project focused on relating the evolution of Methanothermococcus species to site-specific geochemical dynamics and interactions with viruses. I annotated open reading frames and prophage/CRISPR loci in 5 single cell Methanothermococcus genomes extracted from the Von Damm ultramafic vent field to identify the differences in functional potential and viral pressure on the strains represented by each genome. Although the core genome of the five strains comprised metabolic functions previously identified in Methanothermococcus archaea, metabolic functions like nitrogen fixation via the nif genes were identified in a subset of the genomes. Hypothesizing that these “accessory” metabolic functions were non-essential and therefore undergoing neutral selection, I aligned metagenomic samples and metatransciptomic samples to each genome and identified the number amino acid variants and single nucleotide variants in each gene. By examining the number of RNA transcripts aligned to each gene alongside the ratio of non-synonymous to synonymous variation, we identified specific functional elements of the accessory genome experiencing variable selection. In addition to introducing me to a variety of computational methods used for analyzing these complex data, this self-directed research project allowed me develop and test my own hypotheses based on the complex ecological interactions in the subseafloor using a combination of environmental and sequence data.

The second part of this project involved relating the geochemistry of specific sites to the transcript and variant abundances in each genome. By considering the specific environmental features of each site, we were able to merge our analysis on variable selective pressures with our ideas about how viral predation and vent geochemistry are influencing Methanothermococcus species. By dissecting specific metabolic functions, identifying the subset of genomes in which they are present, and identifying the number of transcripts at different vent sites, we began to reconstruct the evolutionary history and ecological relationships in the seafloor.

Oesper lab:

May 2016 - September 2016
PI: Professor Layla Oesper
In Dr. Oesper’s lab, I worked on designing an interface and analysis method for canerous tumor phylogenies reconstructed from ultra-deep sequencing data. I designed this project to enable biological interpretation of the relationships between genes with mutations responsible for the development of blood tumors. Computational tools like AncesTree reconstruct the progression of clonal states in a tumor’s development using single nucleotide mutations and ultra-deep sequencing data. After witnessing tumor phylogeny inference performed with a computational model that would be infeasible through any other means, I began to wonder if computational tools could similarly advance our ability to identify causative biological features from these results. After visualizing the evolution of advanced blood cancers with phylogenetic trees, I laid the groundwork for these trees to be integrated with known databases of protein interactions in order to develop hypotheses about the biological mechanisms motivating the specific sequence of observed clonal states. This multidisciplinary research project spurred my interest in the intersection between biology and computer science, an interest which lingers to this day.

Teaching and work-study

Carleton Prefect Program

Beginning my junior year, I was a prefect in the Carleton Academic Support Center’s Prefect Program]. Prefects are students who attend a class they’ve taken a in order to provide extracurricular support for students taking the class. As part of the prefect program, I was selected based on an extensive list of guidelines and trained to facilitate student engagement with course materials via discussion-based prefect sessions I hosted multiple times per week. I orchestrated discussions in groups of up to 50 students to review course content with accompanying study guides, practice sheets, notes, and best practices for reviewing course material. Classes supported:

Biology Stockroom

I worked in Carleton’s Biology stockroom for my first 2 years of work-study. I prepared media and reagent prep for biology labs, washed too many beakers to count, and performed other activities necessary to keep the Biology department running smoothly.