Publications · MarthLab

SpeedSeq: ultra-fast personal genome analysis and interpretation

Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, Garrison EP, Marth GT, Quinlan AR, Hall IM. 2015.

Ultra fast genome analysis

Sequence Analysis and Characterization of Active Human Alu subfamilies Based on the 1000 Genomes Pilot Project

Konkel MK, Walker JA, Hotard AB, Ranck MC, Fontenot CC, Storer J, Stewart C, Marth GT; 1000 Genomes Consortium, Batzer MA. 2015.

??

Toolbox for mobile-element insertion detection on cancer genomes

Lee WP, Wu J, Marth GT. 2015.

??

Extending reference assembly models

Church DM, Schneider VA, Steinberg KM, Schatz MC, Quinlan AR, Chin CS, Kitts PA, Aken B, Marth GT, Hoffman MM, Herrero J, Mendoza ML, Durbin R, Flicek P. 2015.

??

SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization

Qiao Y, Quinlan AR, Jazaeri AA, Verhaak RG, Wheeler DA, Marth GT. 2014.

??

bam.iobio: a web-based, real-time, sequence alignment file inspector.

Miller CA, Qiao Y, DiSera T, D'Astous B, Marth GT. 2014.

An open-source dashboard web application providing an insightful overview of large, non-human-readable BAM files and enabling users to further analyze their alignments, all in real time.

Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web

Miller CA, Anthony J, Meyer MM, Marth G. 2013.

Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants.

Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression

Busby MA, Stewart C, Miller CA, Grzeda KR, Marth GT. 2013.

Scotty is an interactive web-based application that assists biologists to design an experiment with an appropriate sample size and read depth to satisfy the user-defined experimental objectives.

The 1000 Genomes Project: data management and community access

Clarke L, Zheng-Bradley X, Smith R, Kulesha E, Xiao C, Toneva I, Vaughan B, Preuss D, Leinonen R, Shumway M, Sherry S, Flicek P; 1000 Genomes Project Consortium. 2012.

The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology, and members of the project data coordination center have developed and deployed several tools to enable widespread data access.

ART: a next-generation sequencing read simulator

Huang W, Li L, Myers JR, Marth GT. 2012.

ART is a set of simulation tools that generate synthetic next-generation sequencing reads.

Targeted proteomic dissection of Toxoplasma cytoskeleton sub-compartments using MORN1

Lorestani A, Ivey FD, Thirugnanam S, Busby MA, Marth GT, Cheeseman IM, Gubbels MJ. Cytoskeleton (Hoboken) 2012.

This study significantly contributes to the annotation of the unique cytoskeleton of Apicomplexa.

An integrated map of genetic variation from 1,092 human genomes

1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. 2012.

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing.

Copy Number Variation detection from 1000 Genomes project exon capture sequencing data

Wu J, Grzeda KR, Stewart C, Grubert F, Urban AE, Snyder MP, Marth GT. 2012.

This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions.

A DOC2 protein identified by mutational profiling is essential for apicomplexan parasite exocytosis

Farrell A, Thirugnanam S, Lorestani A, Dvorin JD, Eidell KP, Ferguson DJ, Anderson-White BR, Duraisingh MT, Marth GT, Gubbels MJ. 2012.

The phenotype of a Toxoplasma gondii conditional mutant impaired in host cell invasion and egress was pinpointed to a defect in secretion of the micronemes, an apicomplexan-specific organelle that contains adhesion proteins.

The functional spectrum of low-frequency coding variation

Marth GT, Yu F, Indap AR, Garimella K, Stewart C, Ward A, Yu J, Xue Y; 1000 Genomes Project. 2011.

This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.

The variant call format and VCFtools

Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R; 1000 Genomes Project Analysis Group. 2011.

VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.

A comprehensive map of mobile element insertion polymorphisms in humans

Stewart C, Kural D, Strömberg MP, Stütz AM, Urban AE, Grubert F, Lam HY, Lee WP, Busby M, Indap AR, Garrison E, Korbel JO, Marth GT; 1000 Genomes Project. 2011

Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods.

Demographic history and rare allele sharing among human populations

Gravel S, Henn BM, Indap AR, Marth GT; 1000 Genomes Project, Bustamante CD. 2011.

We examined the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data.

Variation in genome-wide mutation rates within and between human families

Conrad DF; 1000 Genomes Project. 2011.

We present the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios.

BamTools: a C++ API and toolkit for analyzing and managing BAM files

Barnett D, Garrison E, Quinlan A, Strömberg M, Marth G. 2011.

Introduction of a software suite for research analysis and data management using BAM files.

Expression divergence measured by transcriptome sequencing of four yeast species

Busby MA, Gray JM, Costa AM, Stewart C, Stromberg MP, Barnett D, Chuang JH, Springer M, Marth GT. 2011.

We provide an improved methodology for measuring gene expression changes in evolutionary diverged species using RNA Seq, where experimental artifacts can mimic evolutionary effects.

DNA as supramolecular scaffold for functional molecules: progress in DNA nanotechnology

Bandy TJ, Brewer A, Burns JR, Marth G, Nguyen T, Stulz E. 2011.

This tutorial review focuses on the recent progress in this highly active field of research with an emphasis on covalent modifications of DNA.

Mapping copy number variation by population-scale genome sequencing

. Mills RE, Marth GT, Hurles ME, Lee C, McCarroll SA, and Korbel JO; 1000 Genomes Project. 2011

Map of unbalanced SVs based on whole genome DNA sequencing.

Genome Variation Format (GVF) and the 10Gen dataset

Reese MG, Moore B, Batchelor C, Salas F, Cunningham F, Marth GT, Stein L, Flicek P, Yandell M, Eilbeck K. 2010.

A standard variation file format for human genome sequences.

A map of human genome variation from population-scale sequencing

1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. 2010.

This pilot phase of the 1000 Genomes Project is designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms.

Diversity of human copy number variation and multicopy genes

Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J; 1000 Genomes Project, Eichler EE. 2010

Our approach makes ~1000 genes accessible to genetic studies of disease association.

The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. 2009.

The Sequence Alignment/Map format and SAMtools.

Population genomic inferences from sparse high-throughput sequencing of two populations of Drosophila melanogaster

Sackton TB, Kulathinal RJ, Bergman CM, Quinlan AR, Dopman EB, Carneiro M, Marth GT, Hartl DL, Clark AG. 2009.

Application of the Roche/454 platform to survey natural variation in strains of Drosophila melanogaster.

Rapid whole-genome mutational profiling using next-generation sequencing technologies

Smith DR, Quinlan AR, ... Marth GT, Richardon PM. 2008.

Mutational profiling with next-generation DNA sequencers.

EagleView: A genome assembly viewer for next-generation sequencing technologies

Huang W, Marth G. 2008.

A next-generation sequence assembly viewer program.

Whole-genome sequencing and variant discovery in C. elegans

Hillier LW, Marth GT, Quinlan AR, Dooling D, Fewell G, Barnett D, Fox P, Glasscock JI, Hickenbotham M, Huang W, Magrini VJ, Richt RJ, Sander SN, Stewart DA, Stromberg M, Tsung EF, Wylie T, Schedl T, Wilson RK, Mardis ER. 2008.

Whole-genome SNP calling in Illumina reads.

PYROBAYES: An improved base-caller for SNP discovery in pyrosequences

Quinlan AR, Stewart DA, Strömberg MP, Marth GT. 2008.

A base caller program for 454 reads.

Primer-site SNPs mask mutations

Quinlan, AR, Marth GT. 2007.

Missing SNPs because of heterozygosity in PCR primer binding sites.

Analysis of concordance of different haplotype block partitioning algorithms

Indap AR, Marth GT, Struble CA, Tonellato P, Olivier M. 2005.

We simulated 1000 haplotypes using the standard coalescent for three world populations and applied three classes of block partitioning algorithms, assessing algorithm differences in number, size, and coverage of blocks inferred under different conditions of SNP density, allele frequency, and sample size.

Reconstruction of demographic history from the SNP allele frequency spectrum of three world populations

Marth GT, Czabarka E, Murvai J, Sherry ST. 2004.

The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations.

SNP discovery in overlapping sections of BAC clones sequenced by the Human Genome Project. Population genetic inference from polymorphism density distributions

Marth GT, et al. 2003.

Sequence variations in the public human genome data reflect a bottlenecked population history.

A high-density, high-quality microsatellite map of the human genome

Ghebranious N, Vaske D, Yu A, Zhao C, Marth GT, Weber JL. 2003.

STRP screening sets for the human genome at 5 cM density.

Discovery and characterization of short diallelic insertions and deletions

Weber JL, David D, Heil J, Fan Y, Zhao C, Marth GT. 2002

Human diallelic insertion/deletion polymorphisms. American Journal of Human Genetics.

A review of SNP mining methods and data sources

Marth GT. Single Nucleotide Polymorphisms: Methods and Protocols 2002.

Computational SNP discovery in DNA sequence data.

Validation and population-specific allele frequency estimation for hundreds of SNPs found by The SNP Consortium

Marth GT, Yeh R, Minton M, Donaldson R, Li Q, Duan S, Davenport R, Miller RD, Kwok PY. 2001

Single-nucleotide polymorphisms in the public domain: how useful are they?

The first high-density SNP map of the Human genome

Sachidanandam R, ... Marth G, ... Altshuler D 2001

A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.

The PolyBayes SNP discovery algorithm

Marth GT, Yandell MD, Korf I, Gu Z, Yeh RT, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish W. 1999

A general approach to single-nucleotide polymorphism discovery.

The Common Assembly Format (CAF)

Dear S, Durbin R, Hillier L, Marth G, Thierry-Mieg J, Mott R. 1998

Sequence assembly with CAFTOOLS.