Evaluating Strain-level Variation of Key Acidogenic Species in Dental Plaque Biofilms

The characterization of the dental plaque microbiome, using traditional 16S rDNA profiling strategies, illustrates both the strengths and the limitations of this method. The central limitation of the 16S rDNA methodology is the inability to decipher strain-level variation within a microbiome. Why is this important? It is becoming a common theme in microbiome research that microbiomes associated with the human host are distinct from those that inhabit the environment. The species present in distinct human microbiomes represent only a small number of taxa. Within these taxa are relatively few genera that have massive representation of member species. This structure has been referred to as the deep fan structure.  When comparing microbiomes representing healthy and diseased subjects, it may be commonplace that important strain-level variations exist, that are in many instances potentially causally related to the health of the human host. The dental plaque microbiome illustrates this point strongly. Oral microbiologists have isolated strains from species including: S. mitis, S. sanguinis, S. mutans, S. gordonii and others that differ dramatically in their acid production and acid tolerance characteristics. The genes encoding these activities are not part of the core genome, but reflect functions encoded in the strain-variable portion of the genome (~10-30% of the genomes coding capacity). Important aspects of human disease etiology may be missed if we fail to address this possibility.

Summary of Progress: Dental plaque samples from human subjects with and without dental caries were used to isolate S. mutans and S. sobrinus colonies using enrichment culturing procedures. Most colonies were subjected to 2-3 rounds of replating to obtain pure colonies. The individual clones were then grown in liquid media to isolate genomic DNAs to carry out fingerprinting of strains based on RFLP analysis. This allowed us to collapse positive strains that appeared identical or highly similar into a set of strains that appeared to be of maximal diversity, encoding the largest number of unique gene sequences. We further characterized the individual strains using primer pairs that are specific for either S. mutans or S. sobrinus. Several of the isolates were negative by PCR and these corresponded to isolates with unusual RFLP patterns and so were excluded from further analysis. Some isolates tested positive for one of the two primer pairs used for screening and were marked as such but retained for further analysis using genome sequencing. The isolates obtained were multiplexed into two lanes of the Solexa GSA IIx at a theoretical depth of coverage of 50X. Previous evidence based on comparative analyses indicates that strain-specific regions of the S. mutans genome are not randomly distributed but rather are present at discrete locations. The breadth of these regions is not fully characterized but will be greatly enhanced by our analyses. To date no reference genome sequence is available for S. sobrinus, a potentially important contributor to dental caries.

Each genome to be sequenced was uniquely barcoded using the EpiBio Nextera DNA sample prep kit, and sequencing was performed using an Illumina Genome Analyzer IIx. The sequenced reads were then used to search against the Genbank non-redundant nucleotide database for quality assessment and to determine the top hit of each genome.  As shown in Table 1, 76 isolates generated best hits to S. mutans and 47 to S. sobrinus genomes. Among the 17 isolates that do not appear to be either S. mutans or S. sobrinus it is somewhat puzzling how they were cultivated on the medias used. We believe these colonies were impure and predominantly that of the genome sequenced.

Top Blast Hits Genomes # of isolates
S. sobrinus 47
S. parasanguinis 1
E. faecalis 1
Lactobacillus spp. 1
S. mutans 76
Chryseobacterium gleum 1
S. aureus 8
S.  epidermidis 1
S. caprae 4

Table 1. Summary of the tops hits of the reads from each isolate sequenced.

We used Newbler to assemble each of the genomic sequence reads. For S. mutans we used mapping assembly against the S. mutans UA159 sequence and we performed de novo assembly for S. sobrinus sequence reads due to the lack of available reference genome sequence. Overall the sequencing of isolates was successful with one exception. The remaining 75 isolates assembled with an average coverage of 91% with respect to the reference genome. Given what is known about strain-specific gene content in S. mutans one expects 90% coverage to be equivalent to complete coverage since ~10% of UA159’s genome sequence is not likely to be shared with these isolates. The average number of contigs/isolate is 215 with average length of 10,842 bp. Based on this outcome it is highly likely that we will identify sequence reads from essentially all strain-specific genes for each isolate, the extent that full-length gene sequence has been generated and further to what extent those sequences display genomic context are a part of our current efforts.

Ongoing Efforts. We are currently identifying strain-specific sequences from each isolate to determine the extent that these sequences might be shared among newly characterized isolates and their association with either caries-free or caries-active subjects. We will also identify the set of core gene sequences that appear to be present in all S. mutans and S. sobrinus genomes respectively. Ultimately we have demonstrated the use of high throughput sequencing technology as a means for characterizing oral pathogens of interest. Suggested applications for this type of research effort include the generation of strain-specific oligonucleotides to be added to existing DNA microarray content to enhance analysis using standard CGH methods. Another powerful use of this data can be obtained via the application of a variety of selection schemes that reveal the fitness of individual strains among the groups sequenced. The identification of strain-specific sequence signatures allows us to design primer pairs that can be used to measure the abundance and growth characteristics of that strain by qPCR. Potentially more interesting is the measurement of strains’ growth characteristics in competition with other sequenced strains. We have created mixtures of all of the sequenced S. mutans and S. sobrinus strains as independent pools and also generated a super pool including all sequenced strains. We have subjected these pools to a number of selective growth conditions including oxidative stress, low pH and growth on a variety of sugar substrates. In each case we envision that the generation of gene expression data and/or qPCR data detailing the abundance of each strain before and after selection will reveal individual strains that display high and low resistance to low pH, oxidative stress etc. This experimental procedure is analogous to phenotypic screens involving pools of single gene KO strains that have been uniquely barcoded to allow highly parallel analysis using DNA microarrays as popularized by the S. cerevisiae community. The variation performed here is to make use of the strain-specific gene sequences as a surrogate for the molecular barcode. Each strain will have at least one and probably hundreds of unique sequence identifiers that may be exploited for this purpose.

It is our hope that this demonstration will provide the dental research community a blueprint for how genome sequence data can be exploited and become more than a simple GenBank record for reference purposes. The experimental process described above provides a novel way to relate genotypic and phenotypic information on collections of strains derived from healthy and diseased human subjects. The sequence data for all assemblies has been placed in the public domain and we are currently awaiting accession number assignments. If you have some ideas for negative selection, let me know, I am happy to share the strains/pools and funding permitting, primer pair aliquots targeting specific strains in the pools.

The projects described above were supported by NIAID via a contract to JCVI under the Pathogen Functional genomics Resource Center (N01-AI15447) and funds from NIDCR to PFGRC in an attempt to enable the HMP research community to exploit genomic and metagenomic methods. The work pertaining to the oral cavity was done in collaboration with Dr. Walter Bretz at NYU and the efforts pertaining to the gut microbiome were done in collaboration with Dr. Cynthia Sears at JHU.

Cataloguing the Gene Expression Patterns of Dental Plaque Biofilms: A Reference Dental Plaque Transcriptome

The RNA-Seq method has been widely adopted as an alternative to the use of DNA microarrays. In most contexts, the RNA-Seq method is implemented when a single reference organism is being studied. Our project endeavored to establish working methods to enable the generation of cDNA libraries that were depleted of contaminating human mRNA and host/microbiome rRNA sequences that would otherwise represent over 95% of the total sequence reads obtained. We have also made significant efforts to define bioinformatics procedures that allow RNA-Seq data to be assigned to appropriate species such that global gene expression analyses can be routinely conducted by the dental research community and those involved in HMP research objectives.

We have established a catalogue of expressed genes in dental plaque by turning to the Solexa sequencing platform and applying RNA-Seq to a collection of 19 twin pairs that are either concordant for dental health (caries-free concordant twin pairs), concordant for dental caries (caries-active concordant twin pairs) or discordant for dental caries (one twin caries-free and the other member of the twin pair caries-active). Based on our analysis of the data we have established that the most abundant ten species in each sample varies significantly from subject to subject. This fact greatly complicates the mapping of reads to reference genomes. Another significant conceptual challenge we faced was how to conduct highly specific mapping of transcripts to genomes of interest. We know that genes in genomes evolve at substantially different rates; some genes may differ by 2-5% across species boundaries whereas others may differ by 25-30%. The consequence of this is that no single cut-off for mapping a transcript to a reference genome may be reliably employed. We therefore reasoned that by creating an oral cavity reference genome database we could map each transcript according to reasonable specificity criteria but impose a best-hit criteria on the data to ensure minimal mis-mapping.

Based upon the data generated (38 samples X ~32.8 million reads/sample) ~1 billion reads or over 100 Gb of sequence data, we have fulfilled the goal of establishing a robust procedure for RNA-Seq and the specific transcripts expressed in dental plaque biofilms. These sequences and the associated SOPs developed for effective microbial RNA enrichment have been made available through the DACC (http://www.hmpdacc.org/RSEQ/). In addition, we have devised a strategy for mapping reads to particular functional or biochemical pathways such as those related to acid/base production as an independent means of exploiting RNA-Seq data. In this scheme the details of which species are expressing functions is not considered of importance but rather the sum total of expressed sequences related to acid/base production is. The approach used here is similar to that described above in that a database is created pertaining to all sequence data derived from particular biochemical pathways as a means of recruiting reads of appropriate sequence identity mapping to annotated genes. Over- or under-representation of expressed genes constituting discrete pathways may then be evaluated.

The projects described above were supported by NIAID via a contract to JCVI under the Pathogen Functional genomics Resource Center (N01-AI15447)and funds from NIDCR to PFGRC in an attempt to enable the HMP research community to exploit genomic and metagenomic methods. The work pertaining to the oral cavity was done in collaboration with Dr. Walter Bretz at NYU and the efforts pertaining to the gut microbiome were done in collaboration with Dr. Cynthia Sears at JHU.

Surrogate Methods for Profiling Species of the Oral and Gut Microbiome

We engaged in an effort focused on alleviating a substantial barrier facing the human microbiome research community. While powerful, the 16S rDNA gene is insufficiently divergent to allow discrimination of many species and essentially no strains present within communities. The increasing costs of Sanger sequencing has forced most investigators to adopt the use of the Roche, 454 sequencing platform to address the question, “who’s there?”  The benefits of the 454 sequence data are clear as investigators enjoy deep data sets with excellent statistical power. A major drawback relates to the fact that the read length of the 454 platform does not  allow the acquisition of a sufficient number of “informative bases” to allow species level identification and therefore generally depicts the genera present in the microbiome. While there is much to be gained by large-scale analysis of genus-based comparisons, it is highly desirable to have species and even strain-level resolution. Much of the difference in healthy and diseased human microbiomes may lie at the species and strain-level making it important to develop strategies to allow species abundance measurements to be made on large human cohorts, in a cost-effective manner. We used capture array technology in an iterative fashion to establish a comprehensive sequence database of seven conserved gene sequences. We performed a proof of concept using two model systems: the oral (dental plaque) microbiome and the fecal microbiome. We designed capture oligonucleotides that tiled each of seven universally conserved gene sequences present in Genbank belonging to genera known to be present in the gut and oral cavity, respectively. We refer to these oligonucleotides as “seed sequences” for use in capturing orthologous sequences present in both stool and dental plaque biofilms and saliva.

We next prepared complex mixtures of dental plaque and saliva from several individuals and separately also prepared a similar stool mixture representing a diversity of subjects. The DNAs generated from these microbiome samples were used in conjunction with the capture array. We refer to the captured DNAs as “cloud sequences” that represent related sequences (phylogenetic clades) surrounding the original seed sequences. We repeated the capture array process three times such that novel identified sequences relative to the original seeds were added to subsequent capture array designs. Our goal is to establish a taxonomic representation of these microbiomes based on detailed DNA sequence data of seven housekeeping genes, reminiscent of long-standing MLST approaches. We are leveraging existing and future reference genome sequences to annotate the sequence data obtained from capture array data. Additional species may be subsequently added to this framework by the HMP research community simply by sequencing the relevant loci from defined species available via ATCC, BEI or from the strain collections held by hundreds of investigators world-wide.  The power of this approach lies in the provision of DNA sequences that can be used to design qPCR primer pairs capable of highly discriminatory amplification and abundance measurements of species and strains of potential interest.

Despite the fluctuation in the efficiency of capturing orthologs among the seven target genes, we were able to generate a substantial depth of coverage for three genes in the oral cavity, pyrG, pgi and recA and four genes in the gut pyrG, dnaG, pgi and recA. We have been analyzing the total gene sequence data obtained from capture arrays including four 454 runs each for oral and fecal microbiomes. Given the nature of the sequence data as a representation of highly related sequences derived from tens or hundreds of strains belonging to the same species we were pessimistic that assembly of sequence reads would be fruitful. Our attempt at de novo assembly, using newbler, verified our concerns and was not successful. We have defined an in silico approach to organize the sequence data that involves generating a microbiome reference genome database populated with relevant genomes derived from the oral cavity and gut. In addition to the original genes collected from Genbank, we added the 7 targeted gene sequences from 134 oral-related genomes and 162 gut-related genomes. By creating this database we will be able to map each gene sequence to the reference genome to enhance the specificity of each assignment. We are mapping the reads from our sequencing data to genomes using a high stringency cut-offs. Those reads mapping to reference genomes will be used to generate a multiple sequence alignments to derive a consensus sequence and identify exploitable polymorphisms for qPCR primer design. For this we will not only rely on the multi-sequence alignments but we will also compare alignments for any individual species to others within a major clade (common genera). This will allow us to determine the sequences with the highest probability of being unique to the species of interest. Preliminary assessment of the DNA sequence data has shown promising outcomes as we are able to recapitulate phylogenetic clades such as the viridans group of Streptococci using gene sequences derived from recA. This supports the idea that gene representation from species known to be present in the oral cavity were effectively captured. The clade or sub-clade primer design will be based on all the sequences reliably mapped to genomes.

It is our goal to design useful primer pairs representing species-level resolution. This will be achieveable in many cases but not all. We are seeking funds to create a repository of primer pairs to share with the HMP community. It should be noted that initially, none of the primer designs will be experimentally validated and as such users will need to carefully evaluate their usage in the context of their experimental goals. It is our plan to continue efforts associated with this project to conduct validations to the extent that funding permits. These results will be added to the primer designs as they are validated or deemed unsuitable for experimental use.

The projects described above were supported by NIAID via a contract to JCVI under the Pathogen Functional genomics Resource Center (N01-AI15447)and funds from NIDCR to PFGRC in an attempt to enable the HMP research community to exploit genomic and metagenomic methods. The work pertaining to the oral cavity was done in collaboration with Dr. Walter Bretz at NYU and the efforts pertaining to the gut microbiome were done in collaboration with Dr. Cynthia Sears at JHU.

The Mobile Lab Is Going to Sunny San Diego

Late one evening in January 2006, the mobile lab pulled into the parking lot at 9704 Medical Center Drive.  It was such an exciting evening!!  Within a few days, we had all the lab supplies on it and began visiting students.  The first school in the Washington Area was Patapsco Middle School in Howard County.  In addition the other inaugural participating schools were Ron Brown Middle School, Hines Junior High School, and Eliot Junior High School in Washington, DC.  Since then, we had the opportunity to bring the mobile lab to thousands of students in the past 5 years.
First Class on the DG! Mobile Lab, January 2006

First Class on the DG! Mobile Lab, January 2006

Today, the mobile lab began its journey across the US to San Diego.  Let us know if you see it on the highway!
 As you may have seen in September, we just broke ground on our new facility in San Diego.  We began offering education programming in San Diego at our temporary facility in 2007 – we have worked with over 30 teachers.  From these relationships, we look forward to bringing the same opportunities to San Diego students we have in the Washington Area.
Students on the DG! Mobile Lab

Students on the DG! Mobile Lab

 With the current economic environment, keeping this program rolling is challenging.  Yet, it is needed more and more in the classroom.  We need your help!  To find out how you can help keep this science program rolling, visit our Giving Page.

The Hill School: Day 2

The day started early Tuesday with first period.  Thirty eager students arrived on the bus to determine the results of the amplification of the DNA they extracted the day before.  The PCR ran overnight, copying part of a conserved gene in plants, RuBisCo, that can be used to identify the species of land plants.

Loading Gels at the Hill School

Loading Gels at the Hill School

 Using gel electrophoresis, we were able to load gels and run them quickly to see the results.  Most students successfully had amplicons – this was a great since they had not ever done DNA extraction or electrophoresis. The samples have been brought back to Rockville for sequencing and will be available for the students to analyze in about two-weeks.
Loading Gels like a Professional at the Hill School

Loading Gels like a Professional at the Hill School

We had a great visit with the students and are curious to see what plants they brought from around campus.

We look forward to working with them again in the future!

To support our Education program visit http://www.jcvi.org/cms/giving/overview

The Hill School: Day 1

DiscoverGenomics! Mobile Laboratory at the Hill School

DiscoverGenomics! Mobile Laboratory at the Hill School

The day started early with reagent and lab preparation before we even left for school OR had coffee.  We expected to do over 100 DNA Extractions as the first step in the DNA Barcoding. We arrived on campus as the first period was starting –we didn’t have class until after 9:00.

Grinding samples at the Hill School

Grinding samples at the Hill School

 It was a full house (bus) most of the day and busy getting through the DNA extraction.  Various specimens were brought in from around campus to determine their species.  It will be interesting to see the diversity of plants on campus.

Moving through the protocol at The Hill School

Moving through the protocol at The Hill SchoolThe Hill School

 

The Hill School

The Hill School

The Mobile Laboratory Hits the Road

After a hiatus this summer, the Mobile Laboratory hit the road again today for a trip to Pottstown, Pennsylvania.  Driving through the rolling hills of northern Maryland into southeastern Pennsylvania, it passed small towns and beautiful foliage.  Tomorrow and Tuesday, we will be working with students from the Hill School. 

The students will be exploring their campus by determining the species of plants they collect.  This process is often called “DNA Barcoding.”  DNA Barcoding is a standardized procedure using PCR, sequencing and bioinformatic analysis to determine the various species of plants, bacteria, etc. based on conserved genes.

Stay posted for more updates tomorrow!

Sequencing of high yield influenza reassortants at JCVI

As part of the Influenza Genome Sequencing Project, JCVI will be sequencing a large number of high yield influenza reassortants created in the lab of Dr. Doris Bucher at New York Medical College. Dr. Bucher’s lab has prepared the type A H3N2 high yield reassortants  (hyrs) for the influenza vaccine for the past several years, both within the US and world wide.
The Bucher lab continues the tradition of preparing the hyrs as developed by preeminent influenza virologist Dr. Edwin D. Kilbourne (1920-2011). Dr Kilbourne developed and applied the technology to produce the first genetically engineered influenza vaccines; these vaccines, which typically change yearly, have been in use for over 40 years.
JCVI will be sequencing approximately 46 hyrs from Dr. Kilbourne’s collection which was assembled as part of the Kilbourne/New York Medical College Archive of Influenza Virus Reassortants, Mutants, and Antisera. Detailed information is provided for every virus stored in the archive with information at the archive website (www.flu-archive.org). The assembly of the archive was sponsored by the NIAID and viruses in the archive are available through BEI Resources (www.beiresources.org). All sequence data and meta data associated with the hyrs sequenced at JCVI will be made publically available in the Influenza Research Database (www.fludb.org).
Dr. Kilbourne passed away on February 21, 2001 at the age of 90. A eulogy in remembrance of Dr. Kilbourne and his pioneering work in the field of influenza virology can be found at: http://jid.oxfordjournals.org/content/204/2/185.full

What Happened to Sorcerer II?!?!

The last time I wrote a Sorcerer II blog was in November when we set sail from Spain to cross the Atlantic Ocean.  For all of you that have been worried that we have been at sea for 8 months, relax we made it!!  Over the next few days I will update everyone on what has happened and the upcoming plans for Sorcerer II.

First off, the Atlantic Crossing……….On November 13th we left Gibraltar and on November 17th arrived in the Canary Islands.

Canary Islands

Lanzarote Island

After a day on Lanzarote Island we sailed overnight to Las Palmas on Gran Canary, collecting two samples on the way.  We stayed in Las Palmas waiting for a good weather window for the big crossing.  During that time we traded out crew members, fully stocked the boat with food, supplies and fuel.  We also had time to meet our collaborators from the University of Las Palmas.  I gave a group from the University a tour of the boat and showed them the sampling equipment.

Giving a demo of the sample gear

Folks from University of Las Palmas

On November 22nd we took off for the USA USA USA!  There was one problem, a huge storm in the Northern Atlantic.  To avoid the storm we had to go much more south then originally planned, also this storm sucked all the wind up north, so we had very little wind to sail with.  With no wind and a much longer sail than planned, we couldn’t make it directly to Florida…………well we could have but we would have run out of fuel and food!  So on December 8th we arrived in St. Thomas USVI.  For two and a half weeks we motored our way from the Canary Islands to the USVI sampling and fishing.  Total fish count was 8 Mahi Mahi, 3 Wahoo and 2 Yellow Fin Tuna.

Route, Canary Islands to USVI

Les and Me with the catch of the day

Atlantic Ocean Sunset

John BBQ'ing dinner at sea

Crew Sampling

One more thing, during this crossing the following things all happened, the strange part is they all happened within 24 hours of each other!

1.  Little generator broke (not a big deal we have 2 generators), both were up and running in a few hours

2.  Auto-pilot went out……….could have been  a real pain  because we would have  had to hand steer 24 hours a day for the rest of the trip, this was on day 8, but it got fixed in a few hours as well.

3.  Water maker went down………no water for showers, dishes and oh yeah no drinking water……once again got fixed in a few hours

4.  Mainsail ripped, this was fixed a few days later, but for  those days we were pretty rolly out there with no mainsail to keep us steady.

5.  Busted pipe in my head (bathroom)………funny thing is I woke up around 4 am dreaming about waves then I woke up and still heard the waves; it was the water going back and forth in my bathroom floor!

6.  And the big daddy of them all…….. a full out engine room fire!!  Not just smoke or steam…….we are talking flames and extinguishers!  It was taken care of and the engine was up and running about 4 hours later.

On December 20th 2010 Sorcerer II arrived in Florida.  This wrapped up the 2009/2010 Europe Expedition.  During this time we collected 213 samples, filtered over 51,000 liters of water from 13 countries.  DNA from all 3 size fractions from the 213 samples have been extracted, although not all will be sequenced right away, a majority have been sequenced or are schedule to be sequenced this year.  I will write a future blog on the sequencing status of these 213 samples and how we are working with many collaborators from Europe to work up this data set.  I will also write about what has been going on with Sorcerer II since December 2010 and future plans for her.

Podcast on Human Genomics

The 2011 Festival of Ideas themed, The Pursuit of Identity, Landscape, History, and Genetics, is held every other year in Melbourne, Australia to inspire scholars and citizens alike in topics ranging from literature and art to science and foreign policy.  JCVI Professor of Genomic Medicine, Vanessa Hayes participated as a speaker at the festival, and was interviewed for podcast on “Out of Africa: What human genomics is revealing about us.”  The podcast and transcript provide an excellent discussion of modern genomics for a non-technical audience, including a glimpse of the exciting directions in the field and implications for human health.

A video of the session Vanessa Hayes participated in:  The Genetic Revolution I: Health and Human Identity can be downloaded here.