Posts tagged data analysis

H3Africa Update

The National Institutes of Health (NIH) and the UK-based Wellcome Trust, in partnership with the African Society of Human Genetics, developed a program to foster genomic and epidemiological research in African scientific institutions. The laboratory and computational infrastructure available to most scientists on the African continent is currently insufficient to keep up with the rapid developments in DNA sequencing technologies and the need to use advanced computationally intensive methods to analyze this data.

Through the H3Africa Consortium, a partnership between NIH and Wellcome Trust, funding has become available to support knowledge development and implementation of genomics-centered research in several African academic institutions. The first scientific paper to come from this effort, Enabeling the Genomic Revolution in Africa, was published in the journal Science in June 2014.

H3Africa Efforts at J. Craig Venter Institute (JCVI)

One of the main initiatives of H3Africa is to foster scientific exchange between US-based partners and their African-based consortium members. JCVI is involved in a number of such partnerships through training and research collaborations.

Tuberculosis Research with Addis Ababa University

Addis Ababa University is the only Ethiopian institution to receive a primary award from NIH under H3Africa. It is based on a collaboration with JCVI. Professor Gobena Ameni of Addis Ababa University and Dr. Rembert Pieper of JCVI developed a proposal on Systems Biology for Molecular Analysis of Tuberculosis in Ethiopia which was initiated earlier this year. The research focuses on genomic variability in M. tuberculosis strains in Ethiopian pastoralist societies and also has an oral microbiome and proteomic biomarker discovery component.

Bioinformatics Training for African Scientists

As part of H3Africa, JCVI is leveraging its recent GCID award, where appropriate, for training of African Scientists. As part of this effort Dr. Andrey Tovchigrechko  taught microbiome analysis to graduate students in Ibadan, Nigeria. The workshop was organized by the local H3Africa Bioinformatics Network node. The workshop took place in July, 2014 and comprised of students from Nigeria and other West and Central African countries.

Symposium presenters.

Symposium presenters.

Workshop student participants.

Workshop participants.

The workshop was held at IITA.

The workshop was held at IITA.

During the three day workshop, Dr. Tovchigrechko taught the students launching and controlling computing instances on Amazon cloud, the basics of Python and R programming, MG-RAST Web interface, MG-RAST R package matR and JCVI-developed R code MGSAT. MG-RAST tutorials were provided by one of its developers Andreas Wilke (ANL).

Dr. Tovchigrechko also gave a talk, along with a dozen other speakers, at a one-day symposium at the University of Ibadan that preceded the workshop and included approximately 200 participants. Special thanks go to Nash Oyekanmi, the organizer and manager of the whole event, for his relentless efforts.

Collaborations with University of Cape Town

Also as part of the H3Africa Consortium, Dr. William Nierman from JCVI and Dr. Mark Nicol from the University of Cape Town, South Africa are in collaboration to study the nasopharyngeal microbiome and respiratory disease in African children. Dr. Nierman’s group has conducted a month long in house microbiome training workshop with students from Dr. Nicol’s group.

The focus of the training was to teach students JCVI’s complete microbiome pipeline (including sample preparation, sequencing generation, and final association analysis). The aim of the training collaboration is to ensure that this complete pipeline can be performed at the University of Cape Town, to help build independent and sustainable capacity in this field within South Africa.


Virtual Comparative Metagenomics

We have created an open virtualization format (OVF)  package of JCVI’s Metagenomics Reports (METAREP)– a high performance comparative metagenomics analysis tool. The software runs on a web server, retrieves data from two different database systems and uses R for statistical analysis. The new OVF package bundles all these 3rd party tools and is configured to run out of the box in a virtual machine.

Screenshot of the virtual box appliance import wizard. The wizard allows you to specify the CPU and memory usage of the virtual machine on which METAREP will run on.

To run a virtual version of METAREP on your machine, follow these steps

  1. download the METAREP OVF package from our ftp site [download] .
  2. unzipp the OVF package
  3. download and install Oracle’s Virtual Box, a OVF compatible virtualization software [download]
  4. Start Virtual Box
  5. Click File/Import Appliance and select the OVF file.
  6. Adjust RAM/CPU usage using the Appliance Import Wizard (see image)
  7. Start VM
  8. Double-Click on the METAREP firefox link on the VM desktop
  9. Log into METAREP with username=admin and password=admin

This virtual machine appliance is the first step in developing a fully cloud-enabled analysis platform where users can easily launch the application wherever is most convenient: on their personal desktop or in the cloud where they can scale-out the appliance to suite their needs.

Future virtual machine images will be certified to run on other virtualization software platforms. Stay tuned.

If you like to learn more about METAREP and talk to the developers,  join us  at  Lucene Revolution Conference in Boston (October 7-8  2010). We will present a lightning talk about METAREP  the first day of the conference 5pm  (see agenda).





METAREP Source Code

Advance Access JCVI Metagenomics Reports Application Note

A significant JCVI informatics development is JCVI Metagenomics Reports, an open source Web 2.0 application designed to help scientists analyze and compare annotated metagenomics data sets. Users can download the application to upload and analyze their own metagenomics datasets.

METAREP has just been published in Bioinformatics (08/26/2010) as an open access article. The publication is currently accessible under the Bioinformatics Advance Access model. The PDF version can be downloaded at

Supplementary information includes the METAREP data model and an overview about its search performance accessible at

One of METAREP’s  key features that distinguishes it from other metagenomics tools is that it utilizes a high-performance scalable search engine that allows users to analyze and compare extremely large metagenomics datasets, e.g. datasets produced by the Human Microbiome Project.

If you like to learn more about METAREP and talk to the developers,  join us  at  Human Microbiome Research Conference in St. Louis in Missouri (August 31 – September 2, 2010). We will present METAREP  the first day of the conference at 10:35am (see agenda).

Contact Us:

We would like to hear from you. If you have questions or feedback or if you wish to contribute to the METAREP open source project please send an email to





METAREP Source Code

High-performance comparative metagenomics

Are your carrying out large scale metagenomics analyses to identify differences among multiple sample sites? Are you looking for suitable analysis  tools?

If you have not yet found the right analysis tool, you may be interested in  the latest beta version of JCVI Metagenomics Reports (METAREP)  [Test It].

METAREP is a new open source tool developed for high-performance comparative metagenomics .

It provides a suite of web based tools to help scientists view, query, browse, and compare metagenomics annotation data derived from ORFs called on metagenomics reads or assemblies.

Users can either specify fields, or logical combinations of fields, to filter
and refine datasets
. Users can compare multiple datasets at various functional and taxonomic levels, applying statistical tests as well as hierarchical clustering, multidimensional scaling, and heatmaps (see image gallery).

For each of these features, tab delimited files can be exported for downstream analysis. The web site is optimized to be user friendly and fast.

Feature Summary [download Flyer]:

  • Handle extremely large datasets. Uses scalable high-performance Solr/Lucene search engine (we have indexed 300 million annotation entries, but much larger volumes can be handled as shown by Hathi Trust).
  • Compare 20+ datasets at the same time. Use various compare
    options including statistical tests and plot options to visualize
    dataset difference at various taxonomic and functional levels.
  • Apply statistical tests such as METASTATS (White et al.), a modified
    non-parametric t-test to compare two sample populations (e.g.
    metagenomics samples from healthy and diseased individuals).
  • Export publication-ready graphics. Export heatmaps, hierarchical clustering, and multi-dimensional scaling plots in PDF format.
  • Analyze KEGG metabolic pathways. Summaries include enzyme
    highlights on KEGG maps, pathway enzyme distributions, and
    statistics about pathway coverage at various pathway levels.
  • Search using a SQL-like query syntax. Build your query using 14
    different fields that can be combined logically.
  • Drill down into data using METAREP’s NCBI Taxonomy, Gene
    Ontology, Enzyme Classification or KEGG Pathway browser.
    Install your own METAREP version.
  • Flexible central configuration, METAREP and 3rd party code base is completely open source.
  • Cross-link function with phylogeny. Slice your data at various
    taxonomic and/or functional levels. For example, search for all
    bacteria or exclude eukaryotes or search for a certain (GO/EC
    ID)/taxonomic combination.
  • Generic data format. Data types that can be populated include a
    free text functional description, best BLAST hit information, as well
    as GO ID, EC ID, and HMMs.

How to analyze your own data: You can install your own METAREP version to analyze your metagenomics annotation data [download source]. We have written a comprehensive manual that describes the installation process step by step [download manual]. Since METAREP only operates on annotated data, raw sequences need to be annotated first. Supported data types that can be loaded for each sequence include functional descriptions, best BLAST hits fields (E-Value, Percent Identity, NCBI Taxon, Percent Sequence Coverage), GO, EC, and HMM assignments. The installation also contains a set of example annotations that can be imported.

Contact Us:

We would like to hear from you. If you have questions or feedback or if you wish to contribute to the METAREP open source project please send an email to





METAREP Source Code

New ways to analyze metagenomics data

Are you looking for new tools to analyze your metagenomics data? Are you using  MG-RAST, IMG/M or MEGAN for your daily metagenomics work?

JCVI is working on a user friendly alternative that you might be looking for –  a new  tool kit  for metagenomics data visualization and analysis  built using the latest web 2.0 technologies.

JCVI’s Metagenomics Reports (METAREP) is a user friendly web interface designed to help scientists browse, compare, view,  and query annotation data derived from ORFs called on metagenomics reads. It supports both functional (Gene Ontology, Enzyme Commission Classification) and browsing of taxonomic assignments. When performing a search, users can either specify fields or logical combinations of fields to flexibly filter datasets on the fly. METAREP provides lists and pie charts of top functional and taxonomic categories for browse and search results. Tools are being developed that focus on the comparative analysis of multiple datasets. The system is optimized to be user friendly and fast .

Currently, an alpha version of METAREP  is used and tested internally at JCVI. In April 2010 , we will release the beta version to a limited set of interested external users.

If you like to see the tool in action,  join us  at the DOE Genomic Science Workshop ( February 9-10, 2010) for our web and poster presentation (5:30 – 8:00 pm on each day) or sign up to become part of the beta testing process at .

Sampling in Helgoland — A warm German welcome for the Sorcerer II

After a little more than two weeks in Plymouth, UK the Sorcerer II set sail on June 3rd. We were sad to say goodbye to our new friends at PLM, but we were grateful for their hospitality, friendship and scientific collaboration. We’re looking forward to coming back through Plymouth in the fall.

We motor sailed in calm weather but with all the other boat traffic in the English Channel we were on constant watch. On June 6th we arrived on Helgoland, an island about 70 kilometers from the mainland. While Germany has many islands on its coast this is the only high seas island and it is a beautiful land with red sandstone rock. Scientists from the Alfred-Wegener-Institute for Polar and Marine Research, the Biologische Anstalt Helgoland, the Jacobs University Bremen and the Max Planck Institute for Marine Biology joined the Sorcerer II crew to sample another long term research site called the Helgoland Roads Long Term Ecological Research Site  or ‘Kabeltonne.’ The sampling site Kabeltonne is located just outside the main harbor of Helgoland located in the southeastern corner of the North Sea.

Sorcerer II in Helgoland.

Sorcerer II in Helgoland.

Dr. Frank Oliver Glöckner, Head of the Microbial Genomics Group at Max Planck Institute for Marine Microbiology was quoted in a press release regarding the collaboration with the Sorcerer II crew, “Sequence data from the Sorcerer II will complement and improve our own MIMAS data. I asked Frank to contribute to our blog so what follows below is his and his team’s account of their experience to have the Sorcerer II come to Helgoland.

Sunday afternoon the Sorcerer II crew toured the Alfred-Wegener-Institute for Polar and Marine Research facilities and aquarium.  The tour was led by Dr.  Karen Wiltshire, director of Biologische Anstalt Helgoland & Wadden Sea Station Sylt, and was followed by a tasty BBQ hosted by researchers Sonja Oberbeckmann and Katherina Schoo.  It was a wonderful day and our hosts in Helgoland were superb. A heartfelt “thank you” from the entire Sorcerer II crew.

From our collaborators in Germany:

Helgoland Ahoy.  The Sorcerer II has reached the first German harbor.

Already some weeks ago rumors came up that the Sorcerer II might have a sampling stopover in Helgoland, a rocky island in the middle of the North Sea. E-mails were flying back and forth to get the permits and organize the stay. June 2nd we got the message that the crew has left Plymouth and is now heading for Helgoland. We were excited – the Sorcerer II will really come to Helgoland and even stay two nights in the harbor! Those not living on Helgoland quickly organized their travel.

From the other side of the island you could see the large mast of the Sorcerer II. We could not wait to get in contact with the crew and what a warm welcome! Like a swarm of grasshoppers we entered the ship for a visit. All of us wanted to have a look how this beautiful yacht looks from inside. How do the sampling devices look like? How do you store and ship the filters? Many questions arose and got answered patiently by the crew. Action came up when we were allowed to join the sampling of the Helgoland Roads Long Term Ecological Research Site named ‘Kabeltonne’. In former times a buoy was anchored at this site to hold a cable connecting the main island with the Dune, a sandy smaller island populated mainly by seals and tourists in the summer months. The ‘Kabeltonne’ itself has been removed many years ago and is now standing as a reminder of former times in front of the Biological Station Helgoland (BAH) as part of the Alfred-Wegener-Institute for Polar and Marine Research (AWI). Nevertheless, since nearly 50 years samples are taken at this station to monitor food web interactions and the influence of climate change and the diversity of microbial communities in the North Sea.

Sampling with the Sorcerer II at the station ‘Kabeltonne’ goes quickly, the real work with all the filtrations and finally sequencing and data analysis comes later explains Jeff, the lead scientist on board. The sequence data from the Sorcerer II will complement and improve our data of the recently started MIMAS (Microbial Interactions in MArine Systems) project. The MIMAS project generates and integrates diversity, metagenomic, metatranscriptomic and metaproteomic data with contextual data like temperature and nutrient concentrations. It’s exciting – with the Sorcerer II data we are now able to compare the North Sea with marine habitats on a global scale. The day finishes with a barbecue at the BAH including traditional Helgoland food and some duty-free drinks. At the end it’s like we say good bye to good friends. The Sorcerer II has to leave for the Baltic and finally to pick up J. Craig Venter in Stockholm. Nevertheless, we hope that we do not need magic to convince the crew to have another stopover on their way back in August when they finally go to the Mediterranean for the winter.

We all wish you a good trip and happy sampling:

Sonja Oberbeckmann, Uwe Nettelmann, Alexandra Kraberg, Katherina Schoo, Gunnar Gerdts, Karen Wiltshire (all AWI), Manfred Schlösser (Max Planck Institute for Marine Microbiology Bremen), Christine Klockow, Renzo Kottmann and Frank Oliver Glöckner (all MPI-Bremen and Jacobs University Bremen)