The day started early Tuesday with first period. Thirty eager students arrived on the bus to determine the results of the amplification of the DNA they extracted the day before. The PCR ran overnight, copying part of a conserved gene in plants, RuBisCo, that can be used to identify the species of land plants.
Loading Gels at the Hill School
Using gel electrophoresis, we were able to load gels and run them quickly to see the results. Most students successfully had amplicons – this was a great since they had not ever done DNA extraction or electrophoresis. The samples have been brought back to Rockville for sequencing and will be available for the students to analyze in about two-weeks.
Loading Gels like a Professional at the Hill School
We had a great visit with the students and are curious to see what plants they brought from around campus.
We look forward to working with them again in the future!
DiscoverGenomics! Mobile Laboratory at the Hill School
The day started early with reagent and lab preparation before we even left for school OR had coffee. We expected to do over 100 DNA Extractions as the first step in the DNA Barcoding. We arrived on campus as the first period was starting –we didn’t have class until after 9:00.
Grinding samples at the Hill School
It was a full house (bus) most of the day and busy getting through the DNA extraction. Various specimens were brought in from around campus to determine their species. It will be interesting to see the diversity of plants on campus.
Moving through the protocol at The Hill SchoolThe Hill School
After a hiatus this summer, the Mobile Laboratory hit the road again today for a trip to Pottstown, Pennsylvania. Driving through the rolling hills of northern Maryland into southeastern Pennsylvania, it passed small towns and beautiful foliage. Tomorrow and Tuesday, we will be working with students from the Hill School.
The students will be exploring their campus by determining the species of plants they collect. This process is often called “DNA Barcoding.” DNA Barcoding is a standardized procedure using PCR, sequencing and bioinformatic analysis to determine the various species of plants, bacteria, etc. based on conserved genes.
The last time I wrote a Sorcerer II blog was in November when we set sail from Spain to cross the Atlantic Ocean. For all of you that have been worried that we have been at sea for 8 months, relax we made it!! Over the next few days I will update everyone on what has happened and the upcoming plans for Sorcerer II.
First off, the Atlantic Crossing……….On November 13th we left Gibraltar and on November 17th arrived in the Canary Islands.
Canary Islands
Lanzarote Island
After a day on Lanzarote Island we sailed overnight to Las Palmas on Gran Canary, collecting two samples on the way. We stayed in Las Palmas waiting for a good weather window for the big crossing. During that time we traded out crew members, fully stocked the boat with food, supplies and fuel. We also had time to meet our collaborators from the University of Las Palmas. I gave a group from the University a tour of the boat and showed them the sampling equipment.
Giving a demo of the sample gear
Folks from University of Las Palmas
On November 22nd we took off for the USA USA USA! There was one problem, a huge storm in the Northern Atlantic. To avoid the storm we had to go much more south then originally planned, also this storm sucked all the wind up north, so we had very little wind to sail with. With no wind and a much longer sail than planned, we couldn’t make it directly to Florida…………well we could have but we would have run out of fuel and food! So on December 8th we arrived in St. Thomas USVI. For two and a half weeks we motored our way from the Canary Islands to the USVI sampling and fishing. Total fish count was 8 Mahi Mahi, 3 Wahoo and 2 Yellow Fin Tuna.
Route, Canary Islands to USVI
Les and Me with the catch of the day
Atlantic Ocean Sunset
John BBQ'ing dinner at sea
Crew Sampling
One more thing, during this crossing the following things all happened, the strange part is they all happened within 24 hours of each other!
1. Little generator broke (not a big deal we have 2 generators), both were up and running in a few hours
2. Auto-pilot went out……….could have been a real pain because we would have had to hand steer 24 hours a day for the rest of the trip, this was on day 8, but it got fixed in a few hours as well.
3. Water maker went down………no water for showers, dishes and oh yeah no drinking water……once again got fixed in a few hours
4. Mainsail ripped, this was fixed a few days later, but for those days we were pretty rolly out there with no mainsail to keep us steady.
5. Busted pipe in my head (bathroom)………funny thing is I woke up around 4 am dreaming about waves then I woke up and still heard the waves; it was the water going back and forth in my bathroom floor!
6. And the big daddy of them all…….. a full out engine room fire!! Not just smoke or steam…….we are talking flames and extinguishers! It was taken care of and the engine was up and running about 4 hours later.
On December 20th 2010 Sorcerer II arrived in Florida. This wrapped up the 2009/2010 Europe Expedition. During this time we collected 213 samples, filtered over 51,000 liters of water from 13 countries. DNA from all 3 size fractions from the 213 samples have been extracted, although not all will be sequenced right away, a majority have been sequenced or are schedule to be sequenced this year. I will write a future blog on the sequencing status of these 213 samples and how we are working with many collaborators from Europe to work up this data set. I will also write about what has been going on with Sorcerer II since December 2010 and future plans for her.
On May 12th and 13th, the J. Craig Venter Institute in San Diego will be hosting a NASA Astrobiology Institute-funded symposium titled “Paleobiology in the genomics era.” Paleobiology is the study of the origins and evolution of life and, by nature, is interdisciplinary. The goal is to bring together scientists united by this common interest but differentiated by expertise. A major intellectual challenge to paleobiology is the close interaction between environment and life. As life evolved, it changed the environment and suffered the consequences. One of the most extreme examples is the invention of oxygenic photosynthesis by blue-green algae cyanobacteria; the sunlight-fueled production of dramatically changed the availability of crucial elements of life, like nitrogen, sulfur, iron, zinc, copper, and other trace metals. Genome-based analyses showed that these environmental changes modulated the emergence of metal-requiring proteins. For example, proteins that bind Fe evolved when the earth was Fe rich. Essentially, one biological event changed the environment, which in turn induced a subsequent biological change; a feedback cycle between biota and planet.
In order to study these interactions in a robust fashion, numerous lines of evidence must be integrated, despite originating from disparate fields like organic and inorganic geochemistry (oils and metals in rocks), micropaleontology (tiny fossils), and evolutionary biology. Recent years have observed the emergence and maturation of synthetic biology and computational biology, two fields with tremendous potential for the formulation and testing of hypotheses about the evolution of life. To facilitate a dialog between these fields, myself, along with Ariel Anbar from Arizona State University, and John Peters and Eric Boyd from Montana State University, have invited experts to present their work as it pertains to paleobiology. The topic list almost appears schizophrenic, with numerous hard-core geochemical talks being followed presentations on molecular genetics, synthetic biology, metagenomics, and comparative genomics. This was intentional. I hope to feel intellectually challenged in the fashion of a 1st year graduate student and further hope that I’m not the only one. A major wild card at the moment is the identity of over 2/3rd of the attendees. With travel grants available for graduate students, post-doctoral researchers, and faculty, we hope to incorporate novel perspectives not covered by the confirmed speakers.
While the content of the meeting is exciting, the format is pretty sweet too. As part of NASA’s Workshop Without Walls series, the meeting will be webcast live with an accompanying live stream chat. Thus, people will be able to see the presentations and pose questions and comments during the attendant discussions. Previous workshops have often had hundreds of live viewers throughout the meeting, despite only dozens of in situ attendees. The actual energy savings for a single meeting are modest in isolation; imagine 250 people not flying 500 miles and you basically have a single 737 flight that remains grounded. However, the future of environmentally-friendly science requires important preliminary steps to change dominant trends. Similarly, the talks will be streamed live without charge and deposited in the open access scientific podcast site, Scivee.tv; economic barriers to information exchange are removed.
Needless to say, I’m looking forward to this meeting. Organizing something like this is an absolute undertaking. The number of details that need attention is astounding. And if you think I actually could do that, you don’t know me. Numerous people at JCVI have provided invaluable assistance, including Matt LaPointe and Jasmine Pollard, Robert Friedman, Dave Negrotto, and Jody Wilson. It would also have no chance of happening if it not for Pat Goley, who has handed the numerous (re: uncountable) details I’ve lapsed on.
Check out the NASA page for the meeting and webcast registration.
Members of the Human Microbiome Project (HMP) Consortium (see http://commonfund.nih.gov/hmp and http://www.hmpdacc.org for more information on the project and partners) including human microbiome body site experts gathered for a virtual Jamboree January 19th. The fully online-based Jamboree has been set-up to communicate initial data products and tools best suited for analysis, primarily to make the data amendable/consumable in a user-friendly way for body site exerts. 61 participants followed the Jamboree agenda with presenters given access to a common desktop that was shared via the internet using an online collaboration tool. Results from the Data Analysis Working Group (DAWG) were presented in the areas of 16S rRNA gene sequence (16S DAWG) and metagenomic whole-genome shotgun analysis (WGS DAWG). The efforts of the 16S DAWG focus on marker-gene based approaches to estimate biological diversity and how marker variability is associated with patient meta-data. The WGS DAWG complements results from the 16S marker based analysis with comprehensive sequencing of random pieces of genomic DNA from the collection of microorganisms which inhabit a particular site on, or in, the human body (microbiome). These analyses allow researchers to investigate among other questions what microorganisms are present, and the nature and extent of their collective metabolism, at a particular body site. Ultimately researchers want to relate this information to healthy versus diseases states in humans.
METAREP tutorial presented as part of the HMP Virtual Jamboree
The current survey comprises more than 700 samples from hundreds of individuals taken from up to 16 distinct body sites. Illumina sequencing has yielded more than 20 billion Illumina reads and annotation data produced from the sequences exceeds 10 terabytes. In anticipation of such data volumes, we have developed JCVI Metagenomics Reports (METAREP), an open source tool for high-performance comparative analysis, in 2010. The tool enables users to slice and dice data using a combination of taxonomic and functional/pathway signatures. To demonstrate how the tool can be used by body site experts, we picked and loaded sample data from 17 oral samples and presented a quick tutorial on how users can view, search, browse individual samples and compare multiple samples (see video). The functionality was very well received and body site experts asked JCVI to make all the 700+ samples available. As a result of the Jamboree, JCVI in agreement/collaboration with the HMP Data Analysis and Coordination Center and the rest of the HMP consortium, will soon set-up a dedicated HMP METAREP instance that will allow body-site experts and eventually other users to analyze the DAWG data in a user-friendly way via the web.
As the J. Craig Venter Institute (JCVI) soars into its 19th year, we reflect on the past year of highlights and accomplishments to mark the close 2010 and look forward to more significant scientific advances in 2011.
JCVI Top 10 of 2010 …
1. First Synthetic Cell: Fifteen years in the making, 2010 brought to bear with huge anticipation the successful construction of the first self-replicating, synthetic bacterial cell. The work was published in Science in May. The synthetic cell called Mycoplasma mycoides JCVI-syn1.0 is the proof of principle that genomes can be designed in the computer, chemically made in the laboratory and transplanted into a recipient cell to produce a new self-replicating cell controlled only by an artificial genome. Although the first synthetic cell was not designed to produce a specific bioproduct, the team has shown that this can be done and the potential benefits are numerous. The research team, lead by JCVI President Craig Venter, Hamilton Smith, Clyde Hutchison, and Daniel Gibson, envision a future where the rapid design and production of biological products using synthetic biology techniques will be used to produce clean fuels, medicines, and other bioproducts. Throughout the course of this work, the JCVI Policy group has extensively engaged in outside review of the ethical and societal implications of this work, including advising the new Presidential Commission on Bioethics on their recommendations for oversight.
M. mycoides JCVI-syn1
2. Synthetic Vaccines: Following on the heels of the announcement of the first synthetic cell, the company Synthetic Genomics Inc. and JCVI announced in October the formation of a new company, Synthetic Genomics Vaccines Inc. (SGVI). The privately held company is focused on developing next generation vaccines that can be rapidly produced and tested, which is especially important for outbreaks of new infectious diseases. SGVI also announced a three-year collaboration with Novartis to apply synthetic genomics technologies to accelerate the production of the influenza (flu) seed strains required for vaccine manufacturing. The seed strain is the starter culture of a virus, and is the base from which larger quantities of the vaccine virus can be grown. Under this collaboration, Novartis and SGVI will work to develop a “bank” of synthetically constructed seed viruses ready to go into production as soon as WHO makes recommendations on the flu strains. The technology could reduce vaccine production time by up to two months, which is particularly critical in the event of a pandemic.
3. Hydra Genome – one of the animal kingdom’s earliest common ancestors: JCVI scientists along with more than 70 other researchers from around the world, have sequenced and analyzed the genome of Hydra magnipapillata, a fresh water member of the cnidaria– stinging animals that include jellyfish, sea anemones and corals. The research, published in the March 14 edition of Nature, was co-led by Ewen F. Kirkness, JCVI, Jarrod A. Chapman, Department of Energy Joint Genome Institute, and Oleg Simakov, University of California, Berkeley. This is the second sequenced cnidarian genome, following that of a sea anemone, Nematostella vectensis, in 2007. The ancestors of these two species diverged more than 500 million years ago, and comparison of their genomes has revealed common features of the earliest animals that gave rise to the diversity of animals on Earth today. The team found clear evidence for conserved genome structure between the Hydra and other animals, like humans. Unexpectedly, the sequencing also revealed a novel bacterium that lives in close association with the Hydra.
4. Uncovering the Human Microbiome: Microbes are living within and on the human body and this collective community is called the human microbiome. JCVI Scientists, as one component of the large scale NIH Roadmap Human Microbiome Project, and along with colleagues at three other genome centers sequenced the genomes of ~180 microbes from the human body, published in the May 21 edition of Science. At the JCVI we anticipate sequencing an additional 400 species over the next few months. Colleagues at the JCVI are also using single cell approaches to isolate new strains that have not been cultured – isolates whose genomes will also be completely sequenced. The role these microbes play in human health and disease is still relatively unknown and these approaches are allowing us to gain a greater understanding of these enigmatic species.
5. Body Louse Genome: A global research team led by Ewen Kirkness and colleagues from JCVI published a study in the Proceedings of the National Academy of Sciences in June describing the sequencing and analysis of the human body louse, Pediculus humanus humanus, a human parasite responsible for the transmission of bacteria that cause epidemic typhus, relapsing fever and trench fever. Detailed analysis of the genome was then conducted by a large international group of 71 scientists, coordinated by Barry Pittendrigh, University of Illinois, and Professor Evgeny Zdobnov, University of Geneva Medical School. Comparative studies of the body louse genome with other species revealed features that will enhance our understanding of the relationships between disease-vector insects, the pathogens they transmit, and the human hosts. In addition to the targeted louse genome, the project unexpectedly yielded the complete genome sequence of a bacterial species, Riesia, that lives in close association with lice, and which is essential for survival of the insects. The researchers believe that the genome will be a valuable reference for evolutionary studies of insect species, especially in the areas related to insect growth and development.
6. Castor Bean Genome Sequencing: A research team co-led by Agnes P. Chan and colleagues from JCVI and Jonathan Crabtree and others at the Institute for Genome Sciences, University of Maryland School of Medicine, published the sequence and analysis of the castor bean (Ricinus communis) genome in Nature Biotechnology in August. Because of the potential use of castor bean as a biofuel and its production of the potent toxin ricin, the team focused efforts on analysis of genes related to oil and ricin production. The analyses could be important for comparative studies with other oilseed crops, and could also allow for genetic engineering of castor bean to produce oil without ricin. Identifying and understanding the ricin–producing gene family in castor bean will be important in preventing and dealing with potential bioterrorism events. Genomics enables enhanced diagnostic and forensic methods for the detection of ricin and precise identification of strains and geographical origins. As a next step, the group suggests further comparative genomic studies with the close relative cassava, a major crop in the developing world, to further elucidate their disease resistance aspects.
7. Science Education: JCVI was an Official Partner of the inaugural USA Science and Engineering Festival held on the National Mall in Washington, DC in October. The Festival, which was the country’s first national science festival, included over 500 of the country’s leading science and engineering organizations with the aim to reignite the interest of our nation’s youth in the sciences. The JCVI ‘Discover Genomes’ Bus was showcased during a two-day expo and some of the research being done at JCVI was presented to around 1700 visitors by our scientists and staff.
There were lines all day!
8. Viral Genomics– In 2010 the JCVI has published over 1600 influenza genomes and over 75% of all published flu genomes to date have been sequenced by the JCVI, totaling over 6000 genomes. This year the diversity of viral genomes we have sequenced has significantly expanded under the NIH Genomic Sequencing Center for Infectious Diseases contract. Some of the projects include viruses causing diseases such as measles, mumps, rubella, encephalitis, SARS, and the common cold, just to name a few. The viral group has annotated and published 79 Rotavirus (stomach flu) and 33 Coronavirus genomes (includes SARS and common cold) this year and many more will be published in 2011. The pace of sequencing and finishing genomes has also increased this year as a result of adoption of nextgen platforms (e.g. Illumina/454 and Illumina/Solexa) and the development of more efficient methodologies to increase productivity while reducing costs.
9. Marine Microbial Genome Sequencing Project: JCVI scientists have continued their quest to isolate and sequencing microbes living in global ocean waters to discover new genes and enzymes, and to help understand the role microbes play in the ocean ecosystem. Shibu Yooseph, Kenneth Nealson and colleagues at JCVI published an analysis of 137 known marine microbial genomes living in the global ocean surface in Nature in November. These genomes were compared to metagenomic samples of ocean waters of 10.97 million sequences of JCVI’s Sorcerer II Global Ocean Sampling (GOS) metagenomic data and thousands of 16S rRNA sequences. The marine genomes were collected as part of the Gordon and Betty Moore Foundation-funded Marine Microbial Genome Sequencing Project, a project coordinated by JCVI that has a primary goal of obtaining whole genome sequences of ecologically important microbes from a variety of diverse, global marine environments. The work provides a good example of combining metagenomic data with sequenced genomes data to study microbial communities and to generate testable hypotheses in microbial ecology.
10. Sorcerer II Global Ocean Sampling Expedition: On December 17th 2010 Sorcerer II arrived in Florida after spending the last two years with her crew collecting samples in The Baltic, Mediterranean and Black Seas. Funded generously by the Beyster Family Foundation Fund, The San Diego Foundation, and Life Technologies Foundation, Sorcerer II has sailed ~28,000 nautical miles since departing San Diego in March 2009. During this time 212 samples were collected and over 5,100 liters of sea water was filtered and sent to JCVI for analysis of the microbial life contained within these samples. The JCVI established strong collaborations with scientists in all 16 countries in which samples were collected, which will lead to joint publications and future collaborative studies in the new year. Read more.
Sunrise in the Ligurian Sea
Looking Forward to 2011…
Ten-year anniversary of the Human Genome Project: To commemorate the anniversary of the publications of the first human genome sequences in 2001, JCVI and Nature are hosting a conference and celebration in February 2011 titled – Human Genomics: The Next 10 Years. The conference will look forward to the promises of human genomics for the next 10 years, with sessions on medical advances related to genomics; the technological and ethical challenges of human genomics; personalized and familial genomics; the human microbiome project; variation in the human genome; and making sense of the genetic code. This conference will be a great way to jump into the new year and inspire the grandiose ideas and achievements that genomic scientists will accomplish over the years to come.
On November 10th Sorcerer II set sail from Valencia Spain to start the sail back to America. The first leg was a 3 day sail down the Spanish coast to Gibraltar.
Coastline to Gibraltar
Valencia Coastline
John showing the delivery crew around Sorcerer II
We spent one night in Gibraltar to get fuel and supplies. The next day we took a very important sample on the Mediterranean Sea side of the Straits of Gibraltar. We collected a surface sample, which should be the lower salinity Atlantic water coming into the Mediterranean Sea. At the same location we collected a deeper sample, this is the saltier Mediterranean water flowing on the bottom into the Atlantic Ocean.
CTD cast from Med. Sea side of the Straits of Gibraltar. Salinity increased from 36 to 38 PSU
After we collected our last Mediterranean Sea sample, we sailed through the Straits of Gibraltar into the Atlantic Ocean and started our way to the Canary Islands.
Gibraltar
Gibraltar
Sailing through the Straits of Gibraltar at sunset
I arrived late in Boston after my plane from Washington DC was delayed. On the agenda – the next four days the Lucene Revolution conference and a Solr application development workshop organized by Lucid Imagination. The conference promised a unique venue (the first of its kind in the US) to meet developers that all share the same challenge: to enable users to find relevant information in growing bodies of data quickly and intuitively. I was looking forward to hearing many interesting talks given by experts of the field, to learning how to build intuitive search interfaces, and to get an idea where things are heading in the next years. As the developer of JCVI’s Metagenomics Reports (METAREP), I was especially looking forward to the Solr workshop to learn some of the tricks from the experts to tweak the search engine behind this open-source metagenomics analysis tool.
The Early Revolution
But before the revolution could happen and I could enjoy some splendid time at the Washington Dulles airport, Doug Cutting had to start developing a Java based full-text search engine called Lucene in 1997. Lucene became an open-source project in 2000 and an Apache Software Foundation project one year later. In 2004, Solr emerged as an internal CNET project created by Yonik Seeley to serve Lucene powered search results to the company’s website. It was donated by CNET to the Apache Software Foundation in 2006.
Google Trend for Solr
Early this year, both projects merged and development since then has been carried out jointly under the umbrella of the Apache Software Foundation. Meanwhile many companies use Solr/Lucene, among them IBM, LinkedIn, Twitter, and Netflex. How did this happen?
The Lucid Imagination Solr Application Development Workshop
In search of an answer, I made my way from my hotel to the conference venue, the Hyatt Hotel located along the beautiful Boston harbor bay. The 2-day workshop was a brute-de-force tour of Solr features, configuration, and optimization. It also touched on the mathematical theory behind Lucene’s search result scoring and on evaluating result relevance. The 2-day workshop covered enough material to warrant a third day. Given this optimistic agenda, there was not much time for the labs (exercises) and the trainer had to focus more on breadth than on depth. As a one-year Solr user, many of the general concepts were familiar so I was more interested in details. A comprehensive hand-book and an excellent exercise compilation came to the rescue and provided me with the needed detail to follow up on subjects that were touched on. There were two parallel Solr classes. In my class, 25 participants followed the training. The mix included developers working for media, defense, and other co-operations. Academia was represented by several libraries and universities.
Solr Application Development Workshop
A powerful feature I had not heard before is the DISMAX RequestHandler. The handler allows to abstract complex queries. Users can enter a simple query without complex syntax or specifying a search field and behind the scenes the handler will do its magic. It searches across a set of specified fields which (among other things) can be weighted by importance. Additional information about this handler and other snippets I collected during the class can be found in my Solr workshop notes .
The Lucene Revolution Conference
After a mediocre coffee brewed in my hotel room, I headed to the conference venue on the second floor of the Hyatt Hotel. The first day of the conference started with a podium discussion about Cutting Edge of Search that included Michael Busch (Twitter), John Wang (LinkedIn), Joshua Tuberville (eHarmony), and Bill Press (Salesforce.com). The discussion went back and forth showcasing each search platform and the experience in developing it. When asked what he would do differently in retrospect John Wang from LinkedIn ironically mentioned that he would “ban recruiters” – if I correctly remember he mentioned that they “spam-up” the system.
Lightning Talk “Using Solr/Lucene for High-Performance Comparative Metagenomics”
Joshua Tuberville from eHarmony provided valuable advice to developers: “Avoid pet queries for benchmarking a system – use a random set of queries instead.” He also suggested tracking queries that web site users enter for optimization, adding “it surprises me every day that the world is not made up from engineers, but it is a fact.” Avoid unnecessary complexity and duplicating efforts. Use open-source if available. For example, instead of implementing their own Lucene wrapper, eHarmony made use of the open-source project Solr. Bill Press added “Do not be afraid to tear things down, rebuild it many times if needed.”
“Companies do not have time to debug code.” Eric Gries (CEO Lucid Imagination)
Eric Gries, CEO of Lucid Imagination, presented ‘The Search Revolution: How Lucene & Solr Are Changing the World’. In the introduction, he pointed out that Solr/Lucene is the 10th largest community project and the 5th largest Apache Software Foundation project. “Open-source projects need a commercial entity behind them to help them grow”. “Companies need no errors, they do not have time to debug.” His main part focused on his company’s LucidWorks Enterprise software which is based on the open-source project Solr/Lucene. Features that separate it from the open-source version include smart defaults, additional data sources, a REST API that allows programmatic access via Perl/Python/PHP code, standardized error messages, and click based relevance boosting. Later, Brian Pinkerton, also from Lucid Imagination presented additional details. He revealed that their software is based on elements of the upcoming Solr 4.0 version and is fully cloud enabled (added SolrCloud patch). It uses ZooKeeper to manage node configuration and failover. All website communication is done in JSON . The enterprise version supports field collapsing for distributed search.
“A picture communicates a thousand words but a video communicates a thousand pictures.” Satish Gannu (Cisco)
Satish Gannu from Cisco stressed the increasing prevalence of video data and how such data is changing the world. More and more video enabled devices are pushed on the market. Collaboration is increasingly done across the world. Meetings are recorded and shared globally. Videos are replacing manuals. Cooperate communication/PR via video is increasing. He related the popularity of video to the fact that “A picture communicates a thousand words, but a video communicates a thousand pictures” and that “60% of human communication is non-verbal.” Satish went on to highlight Cicso’s video solutions that make use of automatic voice and face recognition software to store metadata about speakers to enrich the user experience. For example, users can filter out certain speakers when watching recorded meetings. More can be found here.
View of Boston
“Mobile application development will be the driver of open-source innovation.” Bill McQuaide (Black Duck Software)
One of the highlights that morning was Bill McQuaide’s talk on open source trends. Based on diverse sources, including his company Black Duck Software, he showed that software IT spending is down, that 22% of software is open source, and that 40% of software projects use open source. There is an enormous amount of new open source projects targeting the cloud with a lot of competition. Among top open-source licenses are the GNU General Public Licenses, GPL 3.0, and BSD licenses. The three predominant programming languages used by open-source developers are C, C++ and Java. Mobile development will be the driver of innovation in the open-source community especially developments around Google’s Android operating system. To manage licenses for projects that integrate dozens of open-source projects such as Android and to ship the bundled software to customers can become very complex. For this and other reasons, McQuaide recommends companies and institutions to have policies for implementing open source, integrating third party tools, and identifying and cataloging all open source software used.
Distributed Solr/Lucene using Hadoop
An excellent talk was presented by Rod Cope, from Open Logic. He presented Real-Time Searching of Big Data with Solr and Hadoop. The search infrastructure centered around Hadoop’s distributed file system on top of which they cleverly arranged several other technologies. For example, Hadoop’s HBase database provides fast database lookups but does not provide the power of Lucene text searches. Solr/Lucene however is not as optimized to return stored document information. Their solution is to use Solr/Lucene to search indexed text fields, storing and returning only the document ID. The returned document ID is then used to fetch additional information from the HBase database. Open Logic uses the open-source software katta to integrate Lucene indices with Hadoop and increased fault tolerance by replicating Solr cores across different machines. Also, corresponding master and slave servers were set up to run on different machines for indexing and searching respectively. The set up he described runs completely on commodity hardware and new machines can be added on the fly to scale out horizontally.
“It surprises me every day that the world is not made up from engineers but it is a fact.” Joshua Tuberville (eHarmony)
Next on the agenda were seven minute lightning talks. I opened-up the lightning talk session describing our Solr/Lucene based open-source web project METAREP for high-performance comparative genomics (watch). Next was Stefan Olafsson from TwigKit presenting ‘The 7-minute Search UI‘, a presentation which I thought was another gem of this conference. In contrast to other talks, it focused on user experience and intuitive user interfaces. TwigKit has developed a framework that provides well designed search widgets that can be integrated with several search engines.
“If nobody is against you in open source then you are not right.” Marten Mickos(CEO Eucalyptus)
The key note presentation on the second day was presented by Marten Mickos the CEO of Eucalyptus and former CEO of MySQL. He opened by advocating his philosophy of making money out of open source projects. “Innovation is a change that creates a new dimension in performance” he said and mentioned the open-source Apache web server that allows anyone to run a powerful web server. He added “Market disruption is a change that creates a new level of efficiency” and referred to MySQL originally designed to scale horizontally. While in 1995 such a design was a draw-back compared to other marked solutions, scale-out has become the dominant design today. Now, within the cloud, horizontal scaling is the key. A fact that has made MySQL the most used database in the cloud.
He observed that “while most successful open-source projects are related to building infrastructure software, servers and algorithms, there are only a few open-source projects centered around human behavior, user experience and user interfaces. The latter projects are mainly developed in closed source environments.” Then he went on praising open-source as a driver for innovation “Open source is so effective because you are not protected. Code can be scrutinized by everybody. In a close sourced company, your only competition is within the company, while in open source you compete with everybody.” Open source is a way to innovate and it is more productive. It usually takes a stubborn individual to drive things. Innovation mostly stems from single individuals that are supported by the community.
When asked how to maintain property rights as a company when running an open-source model, he responded “keep things that keep the business going proprietary but open-up others. The key is to be very transparent with your model.”
What’s next ?
In a podium session the core Solr/Lucene committer team discussed future features. The team works on rapid front-end prototyping using the Apache Velocity template engine and Ajax. The prototyping code can be found in the current trunk of the Solr/Lucene code repository under the /browse directory. A Solr/Lucene cloud enabled version is being developed. Twitter’s real time search functionality will be integrated. Other open source projects that are being integrated are Nutch, a web-search software, and Mahout for machine learning (http://mahout.apache.org). New features will include pivot tables (table matrices), a hierarchical data type, spatial searching, and flexible indexing.
The above represents a subset of talks that took place. There were many other interesting talks – some took place in parallel sessions. Individual presentations can be downloaded from the Lucid Imaginations conference page. A selection of videos is available here. The next Lucene Revolution conference will take place in San Francisco May 2011.
After four days of Solr/Lucene, many coffees, talks, discussions, I left inspired by the conference. It dawned on me that the real revolution is not the search technology but the strong community spirit itself that has emerged and drives developers to jointly work towards a common goal.
With one last sample to collect and the weather still rough in the Mediterranean, we made the decision to make the Banyuls sample a road sampling trip. So Jeremy and I loaded up a rental car with carboys and headed out at 5 am to drive the 125 miles (200km) to Banyuls France from Barcelona Spain.
Driving to Banyuls
After being on the boat for a few months straight, the 2 hour drive was a welcome adventure, even with the 5 am departure! We were greeted at the Observatory of Banyuls by Dr. Ian Salter. Ian showed us around the laboratory and down to the dock to their research vessel.
View of Harbor from Lab
Old facilities
Ian and I on the research boat
They have a station less than ¼ of a mile offshore, which they have been monitoring for many years. We motored out to the site, they did a water column profile with a CTD that was very similar to the CTD we have on Sorcerer II, and then we collected our water from a few meters deep with a niskin bottle . Once we got back to the dock we loaded the carboys into the car and drove back to Sorcerer II to process the sample. It is always good to collect samples with collaborators that have long term monitoring sites and are interested in working with the Venter Institute to analyze the data!