

The first-generation of genome projects benefited greatly from large bodies of pre-existing knowledge regarding their organisms' genomes. Together, these factors can spell disaster for second-generation genome projects. The large volumes of data produced by second-generation sequencing technologies also create difficulties for data management not encountered by first-generation projects.

While first-generation genome projects focused primarily on established model organisms such as Drosophila melanogaster, Caenorhabditis elegans, and Mus musculus, falling sequencing costs are allowing second-generation genome projects to focus on more exotic and phylogenetically isolated organisms. Second-generation sequencing technologies are creating new opportunities as well as new challenges for genomics research. It can also update and manage legacy genome annotation datasets. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality and it can use these data to update legacy annotations, significantly improving their quality. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita.

Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated.
