Tools portfolio

Integration and annotation of metagenomics data



iMetaworld is a web-based resource that integrates public physicochemical data (e.g. CO2 or salinity) with environmental sequencing (metagenomics) data measured in various projects. It allows, for example, to correlate gene abundances in certain samples with environmental constraints. The selection of subsets of the exponentially growing number of samples enables detailed analyses under standardized conditions (e.g. water depth of 2m) as metabolic differences seen in distinct oceans might not only be due to geographic locations, but varying environmental conditions. The resource enables to animations of time series and spatial variations. It can highlight metabolic adaptations that are more due to nutrition conditions and those that require other life style adaptations (temperature, UV etc.).

MLTreeMap: Phylogenetic analysis of metagenomics sequence data

MLTreeMap (mltreemap.org) analyzes DNA sequences and determines their most likely phylogenetic origin. Its main use is in metagenomics projects, where DNA is isolated directly from natural environments and sequenced (the organisms from which the DNA originates are often entirely non-described). MLTreeMap will search such sequences for suitable marker genes, and will use maximum likelihood analysis to place them in the 'Tree of Life'. This placement is more reliable than simply assessing the closest relative of a sequence using BLAST. More importantly, MLTreeMap decides not only who is the closest relative of your query sequence, but also how deep in the tree of life it probably branched off.

Protein sequence analysis and annotation

SMART: Simple Modular Architecture Research Tool


SMART (smart.embl.de) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
More than 750 domain families found in signaling, extracellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues.
Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.

PhosphoELM: A database of S/T/Y phosphorylation sites

Phospho.ELM is a database of experimentally verified phosphorylation sites in eukaryotic proteins. There are 4026 protein entries covering 16,428 instances in the current release. Instances are fully linked to literature references.

eggNOG: evolutionary genealogy of genes - Non-supervised Orthologous Groups

eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) (eggnog.embl.de) is a database of orthologous groups of genes. The orthologous groups are annotated with functional descriptions, which are derived by identifying a common denominator for the genes based on their individual textual descriptions, annotated functional categories, and predicted protein domains.

eggNOG's database currently includes proteins from 630 complete genomes.

Exploration of interaction networks

STRING: Search Tool for the Retrieval of Interacting Genes/Proteins

Information on protein-protein interactions is still mostly limited to a small number of model organisms, and originates from a wide variety of experimental and computational techniques. The database and online resource STRING generalizes access to protein interaction data, by integrating known and predicted interactions from a variety of sources. The underlying infrastructure includes a consistent body of completely sequenced genomes and exhaustive orthology classifications, based on which interaction evidence is transferred between organisms. Although primarily developed for protein interaction analysis, the resource has also been successfully applied to comparative genomics, phylogenetics and network studies, which are all facilitated by programmatic access to the database back-end and the availability of compact download files. Public version of STRING is accessible via string.embl.de

STITCH: Search Tool for Interactions of Chemicals

STITCH (stitch.embl.de) is a resource to explore known and predicted interactions of chemicals and proteins. Chemicals are linked to other chemicals and proteins by evidence derived from experiments, databases and the literature.

STITCH contains interactions for over 68,000 chemicals and over 1.5 million proteins in 373 species.

Annotation and exploration of metabolic pathways

iPath: Interactive Pathways Explorer


iPath is a web-based tool (pathways.embl.de) for the visualization and analysis of the metabolic pathways. The underlying global pathways map is constructed using approximately 120 KEGG pathways, and gives an overview of the complete metabolism in biological systems. Nodes in the map correspond to various chemical compounds and edges represent series of enzymatic reactions.
Various types of data can be mapped onto the default global map, changing the colors, opacity and width of any node or edge. In addition, iPath provides a set of pre-computed metabolic pathway maps for various species and taxnonomic classes. All maps in iPath can be easily converted to various graphical formats.

Text mining tools

SIDER: Side Effect Resource


The SIDER Side Effect Resource (sideeffects.embl.de) represents an effort to aggregate dispersed public information on side effects.

SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. The available information include side effect frequency, drug and side effect classifications as well as links to further information, for example drug-target relations.

Phylogenetic tree annotation

iTOL: Interactive Tree Of Life


Interactive Tree Of Life is web-based tool (itol.embl.de) for the display and manipulation of phylogenetic trees. It provides most of the features available in other tree viewers, and offers a novel circular tree layout, which makes it easy to visualize mid-sized tree (up to several thousand leaves). Trees can be exported to several graphical formats, both bitmap and vector based.
iTOL is one of the first viewers which can annotate the trees with various types of additional data. Many dataset types are supported, from simple and stacked bar charts to pie-charts, animated time-series and protein domains. Additional dataset types and display options can be developed according to your needs.