Genetic data

Work-Package 4 description


This working group has 3 main objectives:


  • Define the species content in Tara Oceans samples by sequencing phylogenetic markers


  • Define the gene content and their expression by the metagenomic and metatranscriptomic sequencing of samples,


  • Significantly increase the collection of genomes and of reference transcriptomes for planktonic organisms, particularly with regard to unicellular eukaryotes (protists). This will help annotate data sets and exploit the information they contain.


Image icon patrick_wincker.jpg
Patrick Wincker is in charge of the Genetic Data working group.

This working group is coordinated by Patrick Wincker at the Evry Genoscope, and involves the UMR 7144 and FR 2424 at the Roscoff Biological Station, and the Structural Genomics Institute of Marseille. Participants in this working group use technology provided by Illumina, a partner in OCEANOMICS.


Environmental community sequencing - Task 4.1

This task aims at understanding the genetic and taxonomic complexity of genetic samples in different size fractions observed through high-throughput sequencing techniques. 3 biological levels are explored: organisms (metabarcoding), their genomes (metagenomics), and the expressed genes (metatranscriptomics).

A high-throughput sequencing approach is used to massively sequence key genetic markers (metabarcoding). This allows us to estimate quickly and in a semi-quantitative way the biodiversity present in the Tara Oceans samples. The markers enable us to distinguish the different communities – eucaryotes, prokaryotes, and the photosynthetic protists.

Metabarcoding was performed on the totality of the Tara Oceans samples and will guide the selection of 25 stations, characteristic of different ocean conditions, for metagenomic and metatranscriptomic studies. Each of these stations corresponds to different samples, in 7 size fractions (from viruses to metazoans) and from 1 to 3 depths – surface, maximum chlorophyll depth (DCM) and mesopelagic. A total of nearly 350 samples is concerned.


Reference oceanic genes, genomes & transcriptomes - Task 4.2

The genetic information contained in the majority of planktonic organisms maintained in culture remains unknown. In fact, the vast majority of marine planktonic organisms cannot be maintained in culture. Thus the greatest limit to the above approaches is the lack of available reference sequences.

The OCEANOMICS team aims to generate new reference sequences that will facilitate the analysis of information generated by metagenetic approaches.


Reverse taxonomy

The first analyses of metabarcoding showed that a significant number of these markers could not be assigned to any organism identified in the databases. Surprisingly, the presence of markers of unknown species is sometimes considerable. To reveal the essential nature of this important unknown biodiversity, OCEANOMICS identifies the most abundant sequences that are associated with identified taxa and search copies of these sequences in the genomes and transcriptomes sequenced for better taxonomic characterization of biodiversity poorly described. In some cases, this information will be related to morphological knowledge generated by the working group 3.


Protists - Reference Transcriptomes

The size of protist genomes can be significantly larger than the human genome. This could be a major obstacle to the metagenetic study of ocean communities. A transcriptomic approach should help overcome this obstacle since the number of genes is rather stable from one eukaryotic species to another (about 10,000 genes per species). Furthermore, mRNA sequencing prevents contamination from prokaryotic nucleic acids, and the translation of these types of sequences can be used directly for searches of similarity, more sensitive at the protein level.


In order to improve the interpretation of metatranscriptomic datasets, OCEANOMICS aims to increase the number of reference transcriptomes for protists. This is done through the use of cultured strains in different collections (including the  Roscoff Culture Collection), or isolated cells identified in freshly collected plankton samples. In this context, OCEANOMICS expects to generate approximately 250 new reference transcriptomes of phylogenetic and/or environmentally-friendly interest. The selection of organisms is guided by data obtained via metabarcoding and metagenomics.


Single-cell sequencing

For eukaryotes less than 20µm in size, single-cell amplified genome sequencing (SAGs) is used.

During the Tara Oceans expedition, organisms included in the smaller size fractions were cryopreserved for this purpose. Cells are isolated by flow cytometry. Their genome is then amplified, barcoded for phylogenetic assignment, and fully sequenced if necessary. A pipeline of bioinformatics tools has been developed for the annotation and analysis of these SAGs.


Genetic data archiving and databasing - Task 4.3

Information pipelines devoted to gene prediction and the taxonomic and functional annotation of metagenetic data had already been developed during the Tara Oceans expedition and the BioMarks project. In the context of OCEANOMICS, these tools are still being used and are evolving in response to the development of newly generated algorithms, systems ontology and accessibility to new genomes and reference transcriptomes from the following working groups – Bioinformatics and modeling ecosystems; Organization and data archiving; and Genetic data.


OCEANOMICS is currently developing some new database systems for organizing and distributing all of the genetic data. The new systems will play a crucial role in linking genetic data with imaging and environmental data.