JCVI Viral Finishing Pipeline: a Winning Combination of Advanced Sequencing Technologies, Software Development and Automated Data Processing

JCVI viral projects are supported by the NIAID Genomic Sequencing Center for Infectious Disease (GSCID). The viral sequencing and finishing pipeline at JCVI combines next generation sequencing technologies with automated data processing. This allowed us to complete over 1,800 viral genomes in the last 12 months, and almost 8,800 genomes since 2005.

Viral Projects at JCVI

JIRA Viral Sample Tracking Workflow

Our NextGen pipeline, which utilizes SISPA-generated libraries with Roche/454 and Illumina sequencing, enables us to complete a wide variety of viral genomes including challenging samples. Automated assembly pipeline employs CLCbio command-line tools and JCVI cas2consed, a cas to ace assembly format conversion tool. Our complimentary Sanger pipeline software is currently being integrated with the NextGen pipeline. This will improve our data processing and will allow us to use validation software (autoTasker) more efficiently.

Assembly of Repetitive Viral Genomes

Genome Organization of Varicella-Zoster

Assembly of Novel Viral Genomes

CLC Assembly Viewer Representation

Promoter of Bat Genome

Promoter of Bat Genome

During the past year we have found that novel viruses, repetitive genomes, and mixed infection samples could not be easily integrated with our high-throughput assembly pipeline. We have developed an assembly and finishing process that utilizes components of the high-throughput pipeline and combines them with manual reference selection and editing. Using this approach we completed novel adenovirus genomes and mixed-infection avian influenza genomes, and improved assemblies of previously unknown arbovirus genomes. We are currently working on optimizing and automating this new pipeline.

Assembly of Mixed Viral Genomes

Consed Representation of Mixed Viral Sample

Consed Representation of Mixed Viral Sample

Repetitive genomes have long been known to present great challenges during assembly and finishing. We are presenting a new approach to assembly and finishing of repetitive varicella genome that is based on separating it into overlapping PCR amplicons followed by merging sequenced amplicons during assembly.

To streamline our viral pipelines, we have fully integrated them with JCVI’s LIMS and JIRA Workflow Management to create a semi-automated tracking interface that follows the progress of viral samples from acquisition through to NCBI submission. This allows us to process a large volume of samples with limited manual interaction and, at the same time, gives us flexibility to work on challenging and novel genomes.

Acknowledgements

The JCVI Viral Genomics Group is supported by federal funds from the National Institute of Allergy and Infectious Disease, the National Institutes of Health, and the Department of Health and Human Services under contracts no. HHSN272200900007C.

Bat coronavirus project is collaboration with Kathryn Holmes and Sam Dominguez, University of Colorado Medical Center.

The authors would like to thank members of the Viral Genomics and Informatics group at JCVI.

References

Viral genome sequencing by random priming methods. Djikeng A, Halpin R, Kuzmickas R, Depasse J, Feldblyum J, Sengamalay N, Afonso C, Zhang X, Anderson NG, Ghedin E, Spiro DJ. BMC Genomics. 2008 Jan 7;9:5A virus discovery method incorporating DNase treatment and its application to the identification of two bovine parvovirus species.  Allander T, Emerson SU, Engle RE, Purcell RH, Bukh J.

Note

This post is based on a poster by Nadia Fedorova, Danny Katzel, Tim Stockwell, Peter Edworthy, Rebecca Halpin, and David E. Wentworth.

Biowalk of Fame

There is a new “Biowalk of Fame” in Maryland, and our own Craig Venter was one of the first honorees receiving a plaque, which is there for all to see as you stroll through lovely Silver Spring.

Etching of Dr. J. Craig Venter on Biowalk of Fame

Etching of Dr. J. Craig Venter on Biowalk of Fame

Other honorees include Dr. Martin Rodbell and Ben Carson. The event to honor the awardees was on April 22, which also it happens to be Earth Day. Although it rained heavily throughout the event, there were a large number of people in attendance including several local government officials including Council member Valerie Ervin and Chairman Ike Leggett. Dr Martine Rothblatt, CEO of United Therapeutics, emceed the event.

Biowalk of Fame tour sign

Biowalk of Fame tour sign

The idea behind the BioWall and the Biowalk is very innovative. The Wall is a live movie like screen that allows videos from students and the public that relate to science to be continuously aired. A student observing a paramecium under the microscope for example can mail the clip in to United Therapeutics, and it will be available for all to see. The Biowalk also has plaques dedicated to those who have made the most outstanding contributions to the State of Maryland in the sciences – hence Craig.

Dr. J. Craig Venter's plaque on the Biowalk of Fame

Dr. J. Craig Venter's plaque on the Biowalk of Fame

Biowalk of Fame

Biowalk of Fame

The take home message is, if you are wondering through Silver Spring do not be surprised if you see Craig’s name on a plaque on 1040 Spring Street. Congratulations!

Moving dirt at JCVI La Jolla

After celebrating the ground breaking of JCVI La Jolla, McCarthy Building Companies immediately got to work preparing the land for construction.  First the crew set up a work area to house the staff and equipment needed for the project.  The site was cleared and stabilized for construction trailers and a temporary road was built for construction vehicles and equipment.  Water trucks were used to control dust and special shaker plates were installed at the entrance of the site to minimize loose dirt and stones on nearby roads.

With basic infrastructure in place, the team moved next to save three large Torrey Pines growing within the construction zone.  The trees had been identified during the design process and were flagged for relocation to protected areas on the site where they will remain as part of the natural landscape.  Big Trees of California, a firm specializing in relocation of large tree species, began the process of “boxing” these trees in custom built structures 14’ wide by 14’ long and 5’ deep.  Wooden lifting beams were installed underneath to provide connection points for the vertical lift.  A large crane was used to “fly” full grown Torrey Pine trees to their new homes.

Relocating Torrey Pine trees at JCVI La Jolla

With the Torrey Pines now safe, preparations for the building pad began. Several pieces of heavy earth working equipment arrived to begin the dig for the building’s foundation.  However, the crew soon discovered a local soil condition, known as “Lindavista formation,” which proved to be a challenge for even some of the largest machinery.  Fortunately, the team was prepared, and with a few equipment modifications, they reached the designed grading levels required for the excavation in just slightly more time than expected.  The earthen building pad was moisture conditioned, compacted, and surveyed by geotechnical engineers. The first major construction milestone was met!   The team is now focused on installing the concrete foundation and underground utilities.

Caterpillar D-9 at Work

While most of the team was focused on moving dirt, McCarthy’s concrete team began assembling 4’x8’ mock-ups of the architectural concrete that will be used for building’s exterior.  The mock-ups are done to determine the best method for constructing future concrete forms, while simultaneously giving the architects a glimpse at the final appearance of the finished product.  In this first series of concrete mock-ups the team is particularly focused on two important decisions:

1)      How much recycled fly-ash material can be added to the concrete while still maintaining the desired look and strength characteristics? Adding fly-ash to a concrete mix design increases the amount of recycled building material used and can count towards LEED credits. The JCVI building is intended to be one of the few LEED Certified Platinum lab building in the US so every step counts.

2)      How will the concrete forms be constructed to produce the desired finish and house the necessary structural steel elements?


Building forms for architectural concrete mock-ups

Stay tuned for updates on progress of the concrete mock-ups and other design elements of the building.  Also, if you haven’t seen it already, check out the web-cam image here which provides hourly updates from the job site.

Scientist Spotlight: Meet David Wentworth

During the height of the H1N1 Flu pandemic, David Wentworth was running a microbial genetics laboratory at the Wadsworth Center, New York State Department of Health (NYSDOH) where he was instrumental in developing a method to amplify influenza genomes regardless of strain using “universal primers” or short strands of DNA that recognize conserved segments across the genomes of many different flu strains. This amplification process was developed to generate recombinant influenza A viruses (the most common flu type affecting humans and animals) that could be used for the production of new vaccines. From a clinical swab it took his team 9-12 days to develop vaccine seed stocks. It was this work that first brought Dave to JCVI’s attention.

Several years ago Dave began collaborations with JCVI scientists to sequence human and avian influenza viruses. The collaborations intensified two years ago when all pandemic flu samples (or suspected flu samples) were first sent to Dave’s lab so the virus could be amplified in sufficient quantities for sequencing using his new amplification pipeline. The amplification took only a day and then isolated, non-infectious, DNA was sent to JCVI for sequencing. JCVI was the natural choice for this work since we are host to the government-funded “Influenza Genome Sequencing Project,” with the goal of sequencing large numbers of viral genomes to help scientists worldwide to understand how flu viruses evolve and cause disease. JCVI researchers then deposited influenza sequences into GenBank within two days of receiving DNA from Dave’s lab, enabling researchers worldwide to track what strains are circulating and how they are evolving. JCVI has sequenced over 75% of the influenza genomes in GenBank, the NIH public repository for sharing genetic sequencing data.

Influenza Genome Amplification Directly From Clinical Specimens

Influenza Genome Amplification Directly From Clinical Specimens (Zhou, B., M. E. Donnelly, D. T. Scholes, K. St.George, M. Hatta, Y. Kawaoka, and D. E. Wentworth. 2009. J.Virol. 83:10309-10313.).

Dave was soon invited for a talk at JCVI. “The opportunities at JCVI were to help build the [viral genomics] program. And already good, quality people are here studying viruses with a focus on viral evolution and sequencing analysis,” Dave remarked. “Being part of generating that information, I think makes you have a better feel for the biology.” The capabilities for viral sequencing combined with IFX strengths and the interest in viral evolution at JCVI was a draw for Dave and he soon joined the team. Moreover, there are opportunities at JCVI to work with collaborators who send specimens from various regions of the world for sequencing so that we can “more deeply understand the mutations that contribute to virulence,” he said. He is particularly interested in antigenic drift (how viruses escape immunity) that contributes to the “annual influenza escape,” which is critical in developing vaccine strains.

New Live Attenuated Vaccine Approaches

New Live Attenuated Vaccine Approaches. Figure shows influenza RNA polymerase activity (GFP) at various temperatures. Mutations engineered into the genome (PB1-Mut3, PB2-Mut4) synergize and inhibit replication at higher temperatures of the lung (37 C) or fever (39 C).

The need for new and improved methods to develop vaccines, coupled with the advances in synthetic genomics developed at JCVI led to the formation last year by JCVI and the company Synthetic Genomics Inc. of a new company, Synthetic Genomics Vaccines Inc. (SGVI). JCVI scientists, through SGVI, are working on a three-year collaboration agreement with Novartis to apply synthetic genomics tools and technologies to accelerate the production of the influenza seed strains required for vaccine manufacturing. The agreement, supported by an award from the U.S. Biomedical Advanced Research and Development Authority (BARDA), could ultimately lead to a more timely and effective response to seasonal and pandemic influenza outbreaks. The idea is to create viruses de novo or synthesize genes critical for its antigenicity and put these in normal vaccine strains for production. The goal of the work at SGVI is to synthesize a virus in one week, or rather a seed stock, which still needs to be amplified in big fermenters. New seed stocks take 3-4 weeks to produce which is currently a rate liming step.

You don’t hear too many people singing its praises and saying “I love the flu!” as Dave has remarked, but put in context, his enthusiasm for his work shines through best when talking about his love of teaching. He gets excited teaching young scientists about virology, especially helping them to understand the important areas to study, and where the research will lead to solve a major problem. “The rewarding part of being a mentor is to see all of the people who have found their niche – it might not be bench research but they are still carrying knowledge with them.”

David Wentworth DEW checking a hive in the late Spring.

David Wentworth DEW checking a hive in the late Spring.

Aside from spending time with his family, Dave enjoys a hobby started by his dad – to cultivate honey bees. A community gardens group at a middle school in Albany, NY was looking for bees to pollinate their plants. Dave spearheaded the effort and used it as a learning tool for kids, who helped feed honey to caterpillars and moths. He also used to give lectures on bee cultivation and has taught college courses in animal science. Dave’s enthusiasm for science among his students and peers could be considered infectious, just like the subject of his research!

2012 JCVI Internship Program Is Now Accepting New Applications

Wow! Another year has gone by.  Its hard to think it is November – almost December with the warm weather we have been enjoying.  However it did not start that way.

Halloween Snow in Maryland!

The 2012 JCVI Internship Program is open to accept spring and summer applications. The application process includes the submission of a resume, essay and transcripts as one PDF file via our online application site. We no longer require letters of recommendation.

Information about the 2012 program can be found at http://www.jcvi.org/cms/education/internship-program/

For summer 2011, we received 544 applicants.  Of these applicants, 30 Interns were selected (10 in San Diego and 20 in Rockville):

  • 7 high school students
  • 9 undergraduate students
  • 13 graduate students
  • 1 secondary teacher

The intern projects ranged across the Institute:

  • A lethal set of virulence factors in uropathogenic E. coli ?
  • Expanding genome transplantation: Streptococcus thermophilus
  • Random Assembly for Use in Swapping as a Tool for Genome Minimization
  • Assembling terminators and promoters
  • Developing Galaxy Tools for the Ordination Analysis of Meta-genomic samples

Good luck to all the applicants this year!

JCVI La Jolla Breaks Ground

It is official! On Tuesday, September 20th JCVI officially broke ground on a new La Jolla, California sustainable lab, to be located directly on the campus of the University of California, San Diego. Craig Venter, JCVI Founder and President along with UCSD Chancellor Marye Anne Fox; Vice Chancellor for Health Sciences and Dean of the School of Medicine, David Brenner; Director of Scripps Institution of Oceanography and Vice Chancellor for Marine Sciences, Tony Haymet; and San Diego Mayor Jerry Sanders wielded bamboo handled shovels and green hardhats  at a formal event attended by 120 guests to  kick off the construction of JCVI La Jolla.  [PRESS RELEASE] The ceremony marked a new chapter in the development of JCVI La Jolla and bodes well for the exciting times ahead as this one-of-a-kind facility is constructed.  During the course of construction and through occupancy of the building we will use this blog to keep you abreast of the building’s progress and discuss many of the unique features that will be incorporated into JCVI’s future home.  To learn more about the building program, architecture, and sustainable features click here.   Otherwise, let’s start with a little background.

More than six years in the making, the JCVI La Jolla project has developed a reputation for pushing the envelope in terms of sustainability and energy efficient laboratory design.  The project aspires to achieve carbon neutrality while also demonstrating the benefits of sustainable building practices.  Both of these are lofty yet we felt achievable goals.

From the beginning, UCSD Chancellor Marye Anne Fox was an enthusiastic supporter of the project, encouraging scientific collaborations with JCVI and sharing JCVI’s sustainability goals.  UCSD’s Resource Management and Planning team, including Vice Chancellor Gary Matthews, Nancy Kossan, Boone Hellmann, Brian Gregory, and many others offered invaluable advice and assistance.

In 2006, we began a nationwide search for an architect to lead the project.  We came to a startling conclusion:.  while the entire industry talked the “green building” talk, very few had actually walked the walk.  Fortunately, there was one firm who had both designed some great laboratory buildings and whose passion for sustainability matched our own.  Zimmer Gunsul Frasca Architects (ZGF) in Los Angeles officially joined the team in February of 2007 under the leadership of Ted Hyman and Doss Mabe.

But having the best architects wasn’t enough, we needed a highly efficient mechanical, electrical, and plumbing (MEP) design to allow us meet our “net zero” energy goals, given the  energy intensive nature of laboratories.  While interviewing another architectural firm, we met Peter Rumsey, a mechanical engineer who was rethinking energy efficiency in buildings and promoting novel alternatives to traditional MEP design.  I recall Peter describing one project where he had replaced all of the 90 degree piping bends in a building with swept radius connectors, reduced the system’s overall resistance, and eliminated the need for dozens of pumps and their associated energy consumption. As if this approach to engineering wasn’t enough of a selling point, we were delighted to discover his firm had just completed the first LEED Platinum certified laboratory at UC Davis’s Tahoe Center.  Rumsey Engineers (now Integral Group) joined ZGF and began assembling the world-class design team that would bring JCVI La Jolla to life on paper.

The first several months of the design process were beyond exciting.  The hand-picked team of green building experts all exuded a noticeable sense of excitement every time we sat around the table to discuss ideas for the project.  The creativity floodgate had been opened and was materializing itself in all aspects of the building’s design from lighting systems to on-site water treatment.  Sustainability became the team’s mantra not only from an environmental perspective, but also for the research that would take place in the laboratories.  The labs had to support not only the science of today, but the science of tomorrow.  As a result, we focused on flexibility to ensure the building could adapt to the occupants’ needs over time.  Late in 2007, ZGF presented an architecturally stunning building that not only met the original design intent of achieving carbon neutral operation through “net zero” energy use, but also employed a constructed wetland to treat waste-water for reuse, and exceeded the USGBC’s LEED Platinum rating criteria.

2007 Rendering of JCVI La Jolla

Fast forward to today.  The design team has spent the past 10 months working relentlessly to incorporate hundreds of pages of detail into the original design and develop a set of construction documents from which the project will be built.  Coordination meetings have been held on a regular basis to ensure every element of the building meets JCVI’s operational needs.  In many cases, the original design was improved after the design team took a second look.  For example the original lab layout was modified to provide more bench area and increase flexibility of the support areas.  What started as one large open lab became a series of “neighborhoods” separated by highly configurable rooms that can adapt to a wide array of equipment configurations.

While the architects prepared drawings, we began checking off the long list of items needed for ground breaking.  Applying the same level of scrutiny used in selecting the architect and engineers, we began interviewing builders to join the team.  McCarthy Building Companies came onboard early in 2011 and immediately began providing valuable input about constructability and delivering the highest levels of quality throughout the construction process.

In parallel with our design efforts, we worked with many UCSD offices and individuals to complete numerous environmental studies, perform plan reviews, and provide community outreach about the project.  The entire UCSD community has been a great supporter of both JCVI and the sustainability goals of the project.  We are grateful for UCSD’s support and guidance throughout the multi-year development process.

Today we stand poised to begin mass excavation for the foundation in late November.  Until then, we are busy working to mobilize work trailers, install temporary power, water, and data at the site, and construct a temporary road between the construction site and Expedition Way.

2011 Rendering of JCVI La Jolla

2011 Rendering of JCVI La Jolla

Evaluating Strain-level Variation of Key Acidogenic Species in Dental Plaque Biofilms

The characterization of the dental plaque microbiome, using traditional 16S rDNA profiling strategies, illustrates both the strengths and the limitations of this method. The central limitation of the 16S rDNA methodology is the inability to decipher strain-level variation within a microbiome. Why is this important? It is becoming a common theme in microbiome research that microbiomes associated with the human host are distinct from those that inhabit the environment. The species present in distinct human microbiomes represent only a small number of taxa. Within these taxa are relatively few genera that have massive representation of member species. This structure has been referred to as the deep fan structure.  When comparing microbiomes representing healthy and diseased subjects, it may be commonplace that important strain-level variations exist, that are in many instances potentially causally related to the health of the human host. The dental plaque microbiome illustrates this point strongly. Oral microbiologists have isolated strains from species including: S. mitis, S. sanguinis, S. mutans, S. gordonii and others that differ dramatically in their acid production and acid tolerance characteristics. The genes encoding these activities are not part of the core genome, but reflect functions encoded in the strain-variable portion of the genome (~10-30% of the genomes coding capacity). Important aspects of human disease etiology may be missed if we fail to address this possibility.

Summary of Progress: Dental plaque samples from human subjects with and without dental caries were used to isolate S. mutans and S. sobrinus colonies using enrichment culturing procedures. Most colonies were subjected to 2-3 rounds of replating to obtain pure colonies. The individual clones were then grown in liquid media to isolate genomic DNAs to carry out fingerprinting of strains based on RFLP analysis. This allowed us to collapse positive strains that appeared identical or highly similar into a set of strains that appeared to be of maximal diversity, encoding the largest number of unique gene sequences. We further characterized the individual strains using primer pairs that are specific for either S. mutans or S. sobrinus. Several of the isolates were negative by PCR and these corresponded to isolates with unusual RFLP patterns and so were excluded from further analysis. Some isolates tested positive for one of the two primer pairs used for screening and were marked as such but retained for further analysis using genome sequencing. The isolates obtained were multiplexed into two lanes of the Solexa GSA IIx at a theoretical depth of coverage of 50X. Previous evidence based on comparative analyses indicates that strain-specific regions of the S. mutans genome are not randomly distributed but rather are present at discrete locations. The breadth of these regions is not fully characterized but will be greatly enhanced by our analyses. To date no reference genome sequence is available for S. sobrinus, a potentially important contributor to dental caries.

Each genome to be sequenced was uniquely barcoded using the EpiBio Nextera DNA sample prep kit, and sequencing was performed using an Illumina Genome Analyzer IIx. The sequenced reads were then used to search against the Genbank non-redundant nucleotide database for quality assessment and to determine the top hit of each genome.  As shown in Table 1, 76 isolates generated best hits to S. mutans and 47 to S. sobrinus genomes. Among the 17 isolates that do not appear to be either S. mutans or S. sobrinus it is somewhat puzzling how they were cultivated on the medias used. We believe these colonies were impure and predominantly that of the genome sequenced.

Top Blast Hits Genomes # of isolates
S. sobrinus 47
S. parasanguinis 1
E. faecalis 1
Lactobacillus spp. 1
S. mutans 76
Chryseobacterium gleum 1
S. aureus 8
S.  epidermidis 1
S. caprae 4

Table 1. Summary of the tops hits of the reads from each isolate sequenced.

We used Newbler to assemble each of the genomic sequence reads. For S. mutans we used mapping assembly against the S. mutans UA159 sequence and we performed de novo assembly for S. sobrinus sequence reads due to the lack of available reference genome sequence. Overall the sequencing of isolates was successful with one exception. The remaining 75 isolates assembled with an average coverage of 91% with respect to the reference genome. Given what is known about strain-specific gene content in S. mutans one expects 90% coverage to be equivalent to complete coverage since ~10% of UA159’s genome sequence is not likely to be shared with these isolates. The average number of contigs/isolate is 215 with average length of 10,842 bp. Based on this outcome it is highly likely that we will identify sequence reads from essentially all strain-specific genes for each isolate, the extent that full-length gene sequence has been generated and further to what extent those sequences display genomic context are a part of our current efforts.

Ongoing Efforts. We are currently identifying strain-specific sequences from each isolate to determine the extent that these sequences might be shared among newly characterized isolates and their association with either caries-free or caries-active subjects. We will also identify the set of core gene sequences that appear to be present in all S. mutans and S. sobrinus genomes respectively. Ultimately we have demonstrated the use of high throughput sequencing technology as a means for characterizing oral pathogens of interest. Suggested applications for this type of research effort include the generation of strain-specific oligonucleotides to be added to existing DNA microarray content to enhance analysis using standard CGH methods. Another powerful use of this data can be obtained via the application of a variety of selection schemes that reveal the fitness of individual strains among the groups sequenced. The identification of strain-specific sequence signatures allows us to design primer pairs that can be used to measure the abundance and growth characteristics of that strain by qPCR. Potentially more interesting is the measurement of strains’ growth characteristics in competition with other sequenced strains. We have created mixtures of all of the sequenced S. mutans and S. sobrinus strains as independent pools and also generated a super pool including all sequenced strains. We have subjected these pools to a number of selective growth conditions including oxidative stress, low pH and growth on a variety of sugar substrates. In each case we envision that the generation of gene expression data and/or qPCR data detailing the abundance of each strain before and after selection will reveal individual strains that display high and low resistance to low pH, oxidative stress etc. This experimental procedure is analogous to phenotypic screens involving pools of single gene KO strains that have been uniquely barcoded to allow highly parallel analysis using DNA microarrays as popularized by the S. cerevisiae community. The variation performed here is to make use of the strain-specific gene sequences as a surrogate for the molecular barcode. Each strain will have at least one and probably hundreds of unique sequence identifiers that may be exploited for this purpose.

It is our hope that this demonstration will provide the dental research community a blueprint for how genome sequence data can be exploited and become more than a simple GenBank record for reference purposes. The experimental process described above provides a novel way to relate genotypic and phenotypic information on collections of strains derived from healthy and diseased human subjects. The sequence data for all assemblies has been placed in the public domain and we are currently awaiting accession number assignments. If you have some ideas for negative selection, let me know, I am happy to share the strains/pools and funding permitting, primer pair aliquots targeting specific strains in the pools.

The projects described above were supported by NIAID via a contract to JCVI under the Pathogen Functional genomics Resource Center (N01-AI15447) and funds from NIDCR to PFGRC in an attempt to enable the HMP research community to exploit genomic and metagenomic methods. The work pertaining to the oral cavity was done in collaboration with Dr. Walter Bretz at NYU and the efforts pertaining to the gut microbiome were done in collaboration with Dr. Cynthia Sears at JHU.

Cataloguing the Gene Expression Patterns of Dental Plaque Biofilms: A Reference Dental Plaque Transcriptome

The RNA-Seq method has been widely adopted as an alternative to the use of DNA microarrays. In most contexts, the RNA-Seq method is implemented when a single reference organism is being studied. Our project endeavored to establish working methods to enable the generation of cDNA libraries that were depleted of contaminating human mRNA and host/microbiome rRNA sequences that would otherwise represent over 95% of the total sequence reads obtained. We have also made significant efforts to define bioinformatics procedures that allow RNA-Seq data to be assigned to appropriate species such that global gene expression analyses can be routinely conducted by the dental research community and those involved in HMP research objectives.

We have established a catalogue of expressed genes in dental plaque by turning to the Solexa sequencing platform and applying RNA-Seq to a collection of 19 twin pairs that are either concordant for dental health (caries-free concordant twin pairs), concordant for dental caries (caries-active concordant twin pairs) or discordant for dental caries (one twin caries-free and the other member of the twin pair caries-active). Based on our analysis of the data we have established that the most abundant ten species in each sample varies significantly from subject to subject. This fact greatly complicates the mapping of reads to reference genomes. Another significant conceptual challenge we faced was how to conduct highly specific mapping of transcripts to genomes of interest. We know that genes in genomes evolve at substantially different rates; some genes may differ by 2-5% across species boundaries whereas others may differ by 25-30%. The consequence of this is that no single cut-off for mapping a transcript to a reference genome may be reliably employed. We therefore reasoned that by creating an oral cavity reference genome database we could map each transcript according to reasonable specificity criteria but impose a best-hit criteria on the data to ensure minimal mis-mapping.

Based upon the data generated (38 samples X ~32.8 million reads/sample) ~1 billion reads or over 100 Gb of sequence data, we have fulfilled the goal of establishing a robust procedure for RNA-Seq and the specific transcripts expressed in dental plaque biofilms. These sequences and the associated SOPs developed for effective microbial RNA enrichment have been made available through the DACC (http://www.hmpdacc.org/RSEQ/). In addition, we have devised a strategy for mapping reads to particular functional or biochemical pathways such as those related to acid/base production as an independent means of exploiting RNA-Seq data. In this scheme the details of which species are expressing functions is not considered of importance but rather the sum total of expressed sequences related to acid/base production is. The approach used here is similar to that described above in that a database is created pertaining to all sequence data derived from particular biochemical pathways as a means of recruiting reads of appropriate sequence identity mapping to annotated genes. Over- or under-representation of expressed genes constituting discrete pathways may then be evaluated.

The projects described above were supported by NIAID via a contract to JCVI under the Pathogen Functional genomics Resource Center (N01-AI15447)and funds from NIDCR to PFGRC in an attempt to enable the HMP research community to exploit genomic and metagenomic methods. The work pertaining to the oral cavity was done in collaboration with Dr. Walter Bretz at NYU and the efforts pertaining to the gut microbiome were done in collaboration with Dr. Cynthia Sears at JHU.

Surrogate Methods for Profiling Species of the Oral and Gut Microbiome

We engaged in an effort focused on alleviating a substantial barrier facing the human microbiome research community. While powerful, the 16S rDNA gene is insufficiently divergent to allow discrimination of many species and essentially no strains present within communities. The increasing costs of Sanger sequencing has forced most investigators to adopt the use of the Roche, 454 sequencing platform to address the question, “who’s there?”  The benefits of the 454 sequence data are clear as investigators enjoy deep data sets with excellent statistical power. A major drawback relates to the fact that the read length of the 454 platform does not  allow the acquisition of a sufficient number of “informative bases” to allow species level identification and therefore generally depicts the genera present in the microbiome. While there is much to be gained by large-scale analysis of genus-based comparisons, it is highly desirable to have species and even strain-level resolution. Much of the difference in healthy and diseased human microbiomes may lie at the species and strain-level making it important to develop strategies to allow species abundance measurements to be made on large human cohorts, in a cost-effective manner. We used capture array technology in an iterative fashion to establish a comprehensive sequence database of seven conserved gene sequences. We performed a proof of concept using two model systems: the oral (dental plaque) microbiome and the fecal microbiome. We designed capture oligonucleotides that tiled each of seven universally conserved gene sequences present in Genbank belonging to genera known to be present in the gut and oral cavity, respectively. We refer to these oligonucleotides as “seed sequences” for use in capturing orthologous sequences present in both stool and dental plaque biofilms and saliva.

We next prepared complex mixtures of dental plaque and saliva from several individuals and separately also prepared a similar stool mixture representing a diversity of subjects. The DNAs generated from these microbiome samples were used in conjunction with the capture array. We refer to the captured DNAs as “cloud sequences” that represent related sequences (phylogenetic clades) surrounding the original seed sequences. We repeated the capture array process three times such that novel identified sequences relative to the original seeds were added to subsequent capture array designs. Our goal is to establish a taxonomic representation of these microbiomes based on detailed DNA sequence data of seven housekeeping genes, reminiscent of long-standing MLST approaches. We are leveraging existing and future reference genome sequences to annotate the sequence data obtained from capture array data. Additional species may be subsequently added to this framework by the HMP research community simply by sequencing the relevant loci from defined species available via ATCC, BEI or from the strain collections held by hundreds of investigators world-wide.  The power of this approach lies in the provision of DNA sequences that can be used to design qPCR primer pairs capable of highly discriminatory amplification and abundance measurements of species and strains of potential interest.

Despite the fluctuation in the efficiency of capturing orthologs among the seven target genes, we were able to generate a substantial depth of coverage for three genes in the oral cavity, pyrG, pgi and recA and four genes in the gut pyrG, dnaG, pgi and recA. We have been analyzing the total gene sequence data obtained from capture arrays including four 454 runs each for oral and fecal microbiomes. Given the nature of the sequence data as a representation of highly related sequences derived from tens or hundreds of strains belonging to the same species we were pessimistic that assembly of sequence reads would be fruitful. Our attempt at de novo assembly, using newbler, verified our concerns and was not successful. We have defined an in silico approach to organize the sequence data that involves generating a microbiome reference genome database populated with relevant genomes derived from the oral cavity and gut. In addition to the original genes collected from Genbank, we added the 7 targeted gene sequences from 134 oral-related genomes and 162 gut-related genomes. By creating this database we will be able to map each gene sequence to the reference genome to enhance the specificity of each assignment. We are mapping the reads from our sequencing data to genomes using a high stringency cut-offs. Those reads mapping to reference genomes will be used to generate a multiple sequence alignments to derive a consensus sequence and identify exploitable polymorphisms for qPCR primer design. For this we will not only rely on the multi-sequence alignments but we will also compare alignments for any individual species to others within a major clade (common genera). This will allow us to determine the sequences with the highest probability of being unique to the species of interest. Preliminary assessment of the DNA sequence data has shown promising outcomes as we are able to recapitulate phylogenetic clades such as the viridans group of Streptococci using gene sequences derived from recA. This supports the idea that gene representation from species known to be present in the oral cavity were effectively captured. The clade or sub-clade primer design will be based on all the sequences reliably mapped to genomes.

It is our goal to design useful primer pairs representing species-level resolution. This will be achieveable in many cases but not all. We are seeking funds to create a repository of primer pairs to share with the HMP community. It should be noted that initially, none of the primer designs will be experimentally validated and as such users will need to carefully evaluate their usage in the context of their experimental goals. It is our plan to continue efforts associated with this project to conduct validations to the extent that funding permits. These results will be added to the primer designs as they are validated or deemed unsuitable for experimental use.

The projects described above were supported by NIAID via a contract to JCVI under the Pathogen Functional genomics Resource Center (N01-AI15447)and funds from NIDCR to PFGRC in an attempt to enable the HMP research community to exploit genomic and metagenomic methods. The work pertaining to the oral cavity was done in collaboration with Dr. Walter Bretz at NYU and the efforts pertaining to the gut microbiome were done in collaboration with Dr. Cynthia Sears at JHU.

The Mobile Lab Is Going to Sunny San Diego

Late one evening in January 2006, the mobile lab pulled into the parking lot at 9704 Medical Center Drive.  It was such an exciting evening!!  Within a few days, we had all the lab supplies on it and began visiting students.  The first school in the Washington Area was Patapsco Middle School in Howard County.  In addition the other inaugural participating schools were Ron Brown Middle School, Hines Junior High School, and Eliot Junior High School in Washington, DC.  Since then, we had the opportunity to bring the mobile lab to thousands of students in the past 5 years.
First Class on the DG! Mobile Lab, January 2006

First Class on the DG! Mobile Lab, January 2006

Today, the mobile lab began its journey across the US to San Diego.  Let us know if you see it on the highway!
 As you may have seen in September, we just broke ground on our new facility in San Diego.  We began offering education programming in San Diego at our temporary facility in 2007 – we have worked with over 30 teachers.  From these relationships, we look forward to bringing the same opportunities to San Diego students we have in the Washington Area.
Students on the DG! Mobile Lab

Students on the DG! Mobile Lab

 With the current economic environment, keeping this program rolling is challenging.  Yet, it is needed more and more in the classroom.  We need your help!  To find out how you can help keep this science program rolling, visit our Giving Page.