Posts in category Bioinformatics

Page 2 of 4

2015 Advanced Genomics, Metagenomics, and Bioinformatics Workshop Wrap-up

I was lucky enough to help set up and plan a workshop covering genomics, metagenomics, proteomics and bioinformatics at the University of the West Indies campus in St. Augustine, Trinidad & Tobago on February 19th and 20th. The workshop was sponsored by the National Institute of Allergy and Infectious Diseases through the Genomic Center for Infectious Diseases cooperative agreement. UWI was a co-sponsor and a gracious host. Participants included 60 individuals from Trinidad, England, Guyana and Barbados. On-line participants were from all over the world including Gambia, Ethiopia, Kenya, India, USA, and the Caribbean.

file-pdf Workshop Slides (PDF – 29MB)

The team of presenters from the JCVI included Karen Nelson, Bill Nierman, Andrey Tovchigrechko, Rembert Pieper, and Shibu Yooseph.  Presenters from UWI included Drs. Christine Carrington and Adesh Ramsubhag.

Karen opened the workshop with a welcome message and overview. She has been a driving force behind the growing relationship between UWI and JCVI. Bill delivered very interesting talks on the history of research on the human microbiome and currently emerging infectious diseases. Rembert handled a presentation and tutorial on proteomic analysis strategies, which was a big hit. If time was not a factor, the question and answer period could have lasted longer than his talk. Finally, Andrey and Shibu presented and gave lessons on statistics, UNIX, and bioinformatics analyses for genomics, metagenomics, and microbiome work.

Dr. Carrington’s presentation on infectious diseases in Trinidad focused on Dengue Fever and Chickungunya, and dovetailed quite nicely with Bill’s presentation on emerging infectious diseases.

Dr. Ramsubhag described the results of his work examining the bacterial diversity of the Nariva Swamp in Trinidad, which uncovered many unique bacterial strains. Perhaps the most important portion of his talk described how important this type of workshop/collaboration is for UWI. Lessons from subject matter experts are invaluable to the undergraduate, graduate and faculty members that attended the workshop as students. In addition, Dr. Ramsuhbag described how a relationship that started through a workshop has given UWI access to cutting edge technologies and data analysis strategies that would be otherwise unavailable without the collaboration with the JCVI.

The students that attended the workshop were all very enthusiastic and eager to learn. They seemingly hung on every word from the presenters, and paid very close attention during the presentations and hands on informatics sessions. A few attendees even asked us to make the lunch break shorter so that the workshop time could be lengthened…but we needed that time to break down the video equipment, haul it to another building and set it up for the afternoon classes. It was a pleasure to help make this learning experience possible for the workshop students!

The workshop was the second time that staff from the JCVI have presented at the St. Augustine Campus of UWI. Tim Stockwell held an 8 hour workshop focused on viruses in 2013.  We look forward to working together in the future.

Special thanks to Tim Stayeas for handling all of the technology associated with on-line broadcasts of the meeting.

Watch all four training sessions below:

Day 1, AM Session

Day 1, PM Session

Day 2, AM Session

Day 2, PM Session

International Bioinformatics Workshop

20th International Bioinformatics Workshop on Virus Evolution & Molecular Epidemiology (VEME) on behalf of the International Centre for Genetic Engineering and Biotechnology

The International Bioinformatics Workshop on VEME workshop is recognized as one of the best virus bioinformatics courses in the world and has so far been organized in Belgium, Brazil, Finland, Greece, Portugal, the USA, South Africa, The Netherlands, Serbia and Italy. The 20th edition will be held 9 – 14 August 2015 at the University of the West Indies (UWI) in Trinidad and Tobago. The workshop is co-organized by the UWI, J. Craig Venter Institute (JCVI) and the University of Leuven.

The workshop will provide intensive training in the mathematical principles and computer applications used in the study virus evolution and for conducting detailed molecular epidemiological investigations. The workshop will include lectures and computer practical session where students will have the opportunity to analyze their own research data. Teachers will include 24 world-renowned researchers (including Richard Scheuermann, Tim Stockwell and Karen E. Nelson from JCVI).

Note: the application deadline has been extended to March 29, 2015.

Detailed information and online applications may be accessed at:


H3Africa Update

The National Institutes of Health (NIH) and the UK-based Wellcome Trust, in partnership with the African Society of Human Genetics, developed a program to foster genomic and epidemiological research in African scientific institutions. The laboratory and computational infrastructure available to most scientists on the African continent is currently insufficient to keep up with the rapid developments in DNA sequencing technologies and the need to use advanced computationally intensive methods to analyze this data.

Through the H3Africa Consortium, a partnership between NIH and Wellcome Trust, funding has become available to support knowledge development and implementation of genomics-centered research in several African academic institutions. The first scientific paper to come from this effort, Enabeling the Genomic Revolution in Africa, was published in the journal Science in June 2014.

H3Africa Efforts at J. Craig Venter Institute (JCVI)

One of the main initiatives of H3Africa is to foster scientific exchange between US-based partners and their African-based consortium members. JCVI is involved in a number of such partnerships through training and research collaborations.

Tuberculosis Research with Addis Ababa University

Addis Ababa University is the only Ethiopian institution to receive a primary award from NIH under H3Africa. It is based on a collaboration with JCVI. Professor Gobena Ameni of Addis Ababa University and Dr. Rembert Pieper of JCVI developed a proposal on Systems Biology for Molecular Analysis of Tuberculosis in Ethiopia which was initiated earlier this year. The research focuses on genomic variability in M. tuberculosis strains in Ethiopian pastoralist societies and also has an oral microbiome and proteomic biomarker discovery component.

Bioinformatics Training for African Scientists

As part of H3Africa, JCVI is leveraging its recent GCID award, where appropriate, for training of African Scientists. As part of this effort Dr. Andrey Tovchigrechko  taught microbiome analysis to graduate students in Ibadan, Nigeria. The workshop was organized by the local H3Africa Bioinformatics Network node. The workshop took place in July, 2014 and comprised of students from Nigeria and other West and Central African countries.

Symposium presenters.

Symposium presenters.

Workshop student participants.

Workshop participants.

The workshop was held at IITA.

The workshop was held at IITA.

During the three day workshop, Dr. Tovchigrechko taught the students launching and controlling computing instances on Amazon cloud, the basics of Python and R programming, MG-RAST Web interface, MG-RAST R package matR and JCVI-developed R code MGSAT. MG-RAST tutorials were provided by one of its developers Andreas Wilke (ANL).

Dr. Tovchigrechko also gave a talk, along with a dozen other speakers, at a one-day symposium at the University of Ibadan that preceded the workshop and included approximately 200 participants. Special thanks go to Nash Oyekanmi, the organizer and manager of the whole event, for his relentless efforts.

Collaborations with University of Cape Town

Also as part of the H3Africa Consortium, Dr. William Nierman from JCVI and Dr. Mark Nicol from the University of Cape Town, South Africa are in collaboration to study the nasopharyngeal microbiome and respiratory disease in African children. Dr. Nierman’s group has conducted a month long in house microbiome training workshop with students from Dr. Nicol’s group.

The focus of the training was to teach students JCVI’s complete microbiome pipeline (including sample preparation, sequencing generation, and final association analysis). The aim of the training collaboration is to ensure that this complete pipeline can be performed at the University of Cape Town, to help build independent and sustainable capacity in this field within South Africa.


J. Craig Venter at Recent Google Zeitgeist Conference [VIDEO]

Dr. J. Craig Venter recently spoke at a Google Zeitgeist conference in Arizona where he spoke on advances in genomics, synthetic biology, and DNA as the software of life.

Understanding Complex Data through Better Visualization

Recently, researchers at JCVI reported on the Rhizoctonia solani mitochondrial genome which was the largest fungal mitochondrion to be sequenced to date. We showed that its unusually large size was probably due to the expansion of multiple genetic elements that populated the genome in somewhat of a ‘parasitic’ relationship. The visualization was meant to impress the number and variety of these repetitive genetic elements, and was selected in a commentary in  FEMS Microbiology Letters as an example of how to summarize molecular data in order to obtain an overall view of the results.

The outermost circle represents the chromosome and repetitive elements. Other important features such as genes, endonucleases, exons, RNAseq coverage are represented in the concentric circles respectively. Grey links represent short repeats (< 35bp) found up to 100 times in the genome; colored links show the location of repeats and follow the coloration in Track 1.

The outermost circle represents the chromosome and repetitive elements. Other important features such as genes, endonucleases, exons, RNAseq coverage are represented in the concentric circles respectively. Grey links represent short repeats (< 35bp) found up to 100 times in the genome; colored links show the location of repeats and follow the coloration in Track 1.

Professional Development Opportunities this Summer

This summer we are offering two professional development workshops: GenomeSolver and Bioinformatics: Unlocking Life through Computation.  Both explore bioinformatics, microbial diversity and the implementation in the undergradauate or high school classrooms. 

The GenomeSolver workshop trains faculty on genome analysis. Workshop attendees will learn about general methodologies, standards, and processes used to annotate and analyze microbial genomes. The workshop contents will be available to aid the faculty in developing teaching modules. In addition, extensive documentation on methodologies and tools will be available via the online environment created for this project. On online web portal Genome Solver ( will be a virtual space for development and sustaining of community. Genome Solver will assist faculty with technical issues and curricular design, as well as an online environment for the ongoing sharing of information including publication of student work.

Bioinformatics: Unlocking Life through Computation is a new opportunity for high school teachers. Genomics and biotechnology are valuable tools in our quest to understand life and nature. However, introducing the science classroom to the computational and mathematical underpinnings of biology can be challenging. The goal of this workshop is to introduce a curriculum for mathematics and science education in the area of genomics (with a focus on the fascinating world of microbes). Educators will be introduced to the various analysis and computational challenges that arise in this discipline. Workflow examples illustrating comparative genomic analysis will be made available through the JCVI Metagenomics Report (METAREP) software infrastructure. The eventual aim is for the educational material to be integrated with local high school curricula requirements to expose students to both hypothesis-driven and discovery-based science.

JCVI Hosts South African Scientists to Share Microbiome Research Techniques

Two scientists from the University of Cape Town, South Africa have joined Dr. Bill Nierman’s lab for the next month as part of NIH’s Human Heredity and Health in Africa (H3Africa) Initiative, a training program designed to build out technical biological skills in the African research community. This training relates specifically to developing techniques around the area of microbiome analysis, a relatively new field in the biological sciences.

Microbiome analysis for the collaborative study is looking at entire community of microorganisms in the respiratory tract of South African infants to better understand how the microbiome is associated with infant pneumonia and wheezing episodes. The expectation is that the organisms that reside in the infant respiratory tract will provide protection from or a predisposition to the pneumonia or wheezing episodes.


The Nierman Group

The Nierman group left to right Sarah Lucas, Bill Nierman, Shantelle Claassen, Mamadou Kaba and Stephanie Mounaud (unpictured Jyoti Shanker and Lilliana Losada) welcomes visiting scientists Ms. Classeen and Dr. Kaba from University of Cape Town for a month long training in microbiome sequencing and analysis.

Mamado Kaba, MD, PhD and colleague Shantelle Claassen from the University of Cape Town will be working closely under the guidance of JCVI’s Stephanie Mounaud who is functioning as the project manager and coordinating the laboratory components of a similar project at JCVI studying the microbiomes of inafnts in the Philippines and also in South Africa. These studies are sponsored by the Bill and Melinda Gates Foundation. The training will focus initially on preparing samples for DNA sequencing on a modern DNA sequencing platform, the Illumina MiSeq instrument. Once the sequence reads are off the sequencer, the instructional focus will shift to analysis of the reads by means of an informatics pipeline that develop phylogenies, or family trees, of the microbes that are obtained from the infant respiratory tract so that the abundance and relatedness of the microbes can be established. The bioinformatics training will be provided by Jyoti Shankar, the statistical analyst working on the Gates Foundation Project.

Mamadou Kaba is a Wellcome Trust Fellow working in the Division of Medical Microbiology, Faculty of Health Sciences, University of Cape Town. Mamadou’s research interests include the molecular epidemiology of infectious diseases and the study of human microbiome in healthy and disease conditions. He has contributed in establishing a new research group conducting studies on how the composition of the upper respiratory tract, gastrointestinal, and the house dust microbial communities influences the development of respiratory diseases.

Prior to joining the University of Cape Town, Mamadou worked as Research Associate at the Laboratory of Medical Microbiology, Timone University Hospital, Marseille, France, where he studied the epidemiological characteristics of infection with hepatitis E virus in South-eastern France.

Shantelle Claassen is pursuing a Masters degree in the Division of Medical Microbiology at the University of Cape Town. She has completed a BSc (Med) Honours degree in Infectious Diseases and Immunology at the University of Cape Town, during which she examined the relative efficacy of extracting bacterial genomic DNA from human faecal samples using five commercial DNA extraction kits. The DNA extraction kits were evaluated based on their ability to efficiently lyse bacterial cells, cause minimal DNA shearing, produce reproducible results and ensure broad-range representation of bacterial diversity.

Mamadou and Shantelle are currently involved in an additional prospective, longitudinal study of which the primary objective is to investigate the association between fecal bacterial communities and recurrent wheezing during the first two years of life.

Plant Bioinformatics Workshop

JCVI recently held its 3rd Annual Plant Bioinformatics Workshop from July 15-19th. During the week-long workshop, 20 scientists from the Plant Research community visited JCVI and learned many aspects of Bioinformatics from the members of Chris Town’s Plant Genome group. Attendees included undergraduate and graduate students, post-doctoral fellows, research scientists and faculty at various Universities throughout the United States as well as a biotech company. In addition to the on-site participants, we had 5 additional participants attend the workshop via WebEx. The virtual participants had the opportunity to sit in on the lectures and complete the hands on exercises by logging into an Amazon Cloud instance, which was set up specifically for this purpose. The topics covered during the workshop included UNIX tools for Bioinformatics, Genome Assembly, Structural and Functional Annotation, RNA-seq assembly and analysis and SNPs. In addition to JCVI’s instructors, we had additional sections covered by external instructors. Eric Lyons (University of Arizona and iPlant) presented on Comparative Genomics and the iPlant Infrastructure and Ann Loraine (UNC Charlotte) presented on Integrated Genome Browser. All sessions contained a hands-on component so the students would have the opportunity to use the tools that we discussed during the lecture portion.  Watch our website for future offerings!


JCVI Viral Finishing Pipeline: a Winning Combination of Advanced Sequencing Technologies, Software Development and Automated Data Processing

JCVI viral projects are supported by the NIAID Genomic Sequencing Center for Infectious Disease (GSCID). The viral sequencing and finishing pipeline at JCVI combines next generation sequencing technologies with automated data processing. This allowed us to complete over 1,800 viral genomes in the last 12 months, and almost 8,800 genomes since 2005.

Viral Projects at JCVI

JIRA Viral Sample Tracking Workflow

Our NextGen pipeline, which utilizes SISPA-generated libraries with Roche/454 and Illumina sequencing, enables us to complete a wide variety of viral genomes including challenging samples. Automated assembly pipeline employs CLCbio command-line tools and JCVI cas2consed, a cas to ace assembly format conversion tool. Our complimentary Sanger pipeline software is currently being integrated with the NextGen pipeline. This will improve our data processing and will allow us to use validation software (autoTasker) more efficiently.

Assembly of Repetitive Viral Genomes

Genome Organization of Varicella-Zoster

Assembly of Novel Viral Genomes

CLC Assembly Viewer Representation

Promoter of Bat Genome

Promoter of Bat Genome

During the past year we have found that novel viruses, repetitive genomes, and mixed infection samples could not be easily integrated with our high-throughput assembly pipeline. We have developed an assembly and finishing process that utilizes components of the high-throughput pipeline and combines them with manual reference selection and editing. Using this approach we completed novel adenovirus genomes and mixed-infection avian influenza genomes, and improved assemblies of previously unknown arbovirus genomes. We are currently working on optimizing and automating this new pipeline.

Assembly of Mixed Viral Genomes

Consed Representation of Mixed Viral Sample

Consed Representation of Mixed Viral Sample

Repetitive genomes have long been known to present great challenges during assembly and finishing. We are presenting a new approach to assembly and finishing of repetitive varicella genome that is based on separating it into overlapping PCR amplicons followed by merging sequenced amplicons during assembly.

To streamline our viral pipelines, we have fully integrated them with JCVI’s LIMS and JIRA Workflow Management to create a semi-automated tracking interface that follows the progress of viral samples from acquisition through to NCBI submission. This allows us to process a large volume of samples with limited manual interaction and, at the same time, gives us flexibility to work on challenging and novel genomes.


The JCVI Viral Genomics Group is supported by federal funds from the National Institute of Allergy and Infectious Disease, the National Institutes of Health, and the Department of Health and Human Services under contracts no. HHSN272200900007C.

Bat coronavirus project is collaboration with Kathryn Holmes and Sam Dominguez, University of Colorado Medical Center.

The authors would like to thank members of the Viral Genomics and Informatics group at JCVI.


Viral genome sequencing by random priming methods. Djikeng A, Halpin R, Kuzmickas R, Depasse J, Feldblyum J, Sengamalay N, Afonso C, Zhang X, Anderson NG, Ghedin E, Spiro DJ. BMC Genomics. 2008 Jan 7;9:5A virus discovery method incorporating DNase treatment and its application to the identification of two bovine parvovirus species.  Allander T, Emerson SU, Engle RE, Purcell RH, Bukh J.


This post is based on a poster by Nadia Fedorova, Danny Katzel, Tim Stockwell, Peter Edworthy, Rebecca Halpin, and David E. Wentworth.

Summit on Systems Biology, June 15-17, 2011

I attended the Summit on Systems Biology hosted by Virginia Commonwealth University in Richmond, VA June 15-17.  So, judging from the talks given, what is systems biology?

  • Systems biology is non-linear and/or multi-step.  Heavy math does not make something systems biology if it’s directly solvable.  Taking a big gene expression matrix, using principle component analysis on it, and coming up with a linear equation for the contributions of a list of biomarker genes, is not systems biology.  The same microarray expression experiment, coupled with pathway analysis in order to reduce candidate genes and so do a less stringent multiple-hypothesis-testing-correction and so have fewer false negatives, is.  So is a non-linear model of how just a few genes interact over time.
  • Standard bioinformatic analysis seeks correlations.  Systems biology goes beyond that to seek cause and effect.  Thus, most systems biology work involves time series, and sometimes simulation.

What data and techniques do systems biologists use?

  • Large datasets of all types.  Microarray time-series, genomes, SNPs, protein-protein interactions, automated protein annotation – anything that comes in gigabytes instead of kilobytes.
  • There was marked interest in protein-protein interaction networks, and in micro RNAs (which inhibit translation of multiple target mRNAs).
  • There were several papers using reverse-phase protein microarrays.  RPMAs can distinguish phosphorylated (which usually means active) from unphosphorylated proteins, which helps understand protein interaction dynamics.
  • There were several papers using weighted gene co-expression network analysis.  WGCNA analyzes modules of co-expressed genes, rather than individual genes.  This gives more statistical power from sparse data.  Brian Sayre of VSU identified disease-resistance genes in livestock and crop species using single-nucleotide polymorphisms (SNPs) from related species.  We might know about some goats that are resistant to a disease that also affects sheep; but sheep don’t have the same SNPs as goats.  His group categorized the SNPs into genes, and the genes into pathways common across species, then looked for pathways associated with disease resistance in other species, and hypothesized that the same pathways would be involved in disease resistance in the target species.

What do people do with systems biology?

  • Medical applications predominated.  The main areas of interest were cancer, aging, cell simulation, eukaryotic model organisms, genome-wide association studies, pathway analysis, and immunology.
  • There were no talks about industrial applications or synthetic biology.
  • There were no talks on prokaryotes, except one on host-pathogen interactions.  This struck me as odd, since eukaryotes are more difficult to analyze or simulate than prokaryotes, and we haven’t done these things with prokaryotes yet.
  • There were no talks on metagenomics.  This also struck me as odd; bacterial communities seem like a natural systems biology problem.

What does the future hold for systems biology?

  • Omniomics:  We don’t want just a protein’s sequence – we want to know where and when it is expressed, what regulates it, what it interacts with, and what parameters describe those interactions. Soon, annotating a genome will not mean producing a list of genes and their functions – it will mean producing a simulation.
  • We need to learn to think at a higher level of abstraction.  If you have tens or hundreds of thousands of genes, transcripts, proteins, small molecules, and structures interacting, you need to figure out what it is you’re really interested in (e.g., “How did this cancer bypass the G1 cell-cycle restriction checkpoint?”), how to specify that precisely enough to ask the computer for an answer, and not to insist on understanding all the details if the answer checks out.
  • There is a growing gap between research and practice.  We can make more and more detailed analyses of diseases, especially in cancer, where each patient has a unique disease at the genetic level.  Meanwhile, the FDA approval process is so long and expensive that even in diseases (for example, Alzheimer’s and FTLD) for which there are millions of patients and a handful of known causes, pharmaceutical companies don’t try to develop three to four separate therapies for those three to four causes.  And the gap is growing wider:  Even as we are coming up with ways to combine weak information from across an entire genome, the FDA is considering proposals to regulate genomic sequencing that would forbid doctors from acquiring a full sequence.