Tools portfolio
- Integration of metagenomics data
- Sequence analysis and annotation
- Exploration of interaction networks
- Annotation and exploration of metabolic pathways
- Text mining
- Phylogenetic tree annotation
All our existing tools can be fully customized and integrated into your existing resource framework.
Integration and annotation of metagenomics data
iMetaWorld
iMetaworld is a web-based resource that integrates public physicochemical data (e.g. CO2 or salinity) with environmental sequencing (metagenomics) data measured in various projects. It allows, for example, to correlate gene abundances in certain samples with environmental constraints. The selection of subsets of the exponentially growing number of samples enables detailed analyses under standardized conditions (e.g. water depth of 2m) as metabolic differences seen in distinct oceans might not only be due to geographic locations, but varying environmental conditions. The resource enables to animations of time series and spatial variations. It can highlight metabolic adaptations that are more due to nutrition conditions and those that require other life style adaptations (temperature, UV etc.).
MLTreeMap: Phylogenetic analysis of metagenomics sequence data
MLTreeMap (mltreemap.org) analyzes DNA sequences and determines their most likely phylogenetic origin. Its main use is in metagenomics projects, where DNA is isolated directly from natural environments and sequenced (the organisms from which the DNA originates are often entirely non-described). MLTreeMap will search such sequences for suitable marker genes, and will use maximum likelihood analysis to place them in the 'Tree of Life'. This placement is more reliable than simply assessing the closest relative of a sequence using BLAST. More importantly, MLTreeMap decides not only who is the closest relative of your query sequence, but also how deep in the tree of life it probably branched off.
Protein sequence analysis and annotation
SMART: Simple Modular Architecture Research Tool
SMART (smart.embl.de) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
More than 750 domain families found in signaling, extracellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues.
Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.
PhosphoELM: A database of S/T/Y phosphorylation sites
Phospho.ELM is a database of experimentally verified phosphorylation sites in eukaryotic proteins. There are 4026 protein entries covering 16,428 instances in the current release. Instances are fully linked to literature references.
eggNOG: evolutionary genealogy of genes - Non-supervised Orthologous Groups
eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) (eggnog.embl.de) is a database of orthologous groups of genes. The orthologous groups are annotated with functional descriptions, which are derived by identifying a common denominator for the genes based on their individual textual descriptions, annotated functional categories, and predicted protein domains.
eggNOG's database currently includes proteins from 630 complete genomes.
Exploration of interaction networks
STRING: Search Tool for the Retrieval of Interacting Genes/Proteins
Information on protein-protein interactions is still mostly limited to a small number of model organisms, and originates from a wide variety of experimental and computational techniques. The database and online resource STRING generalizes access to protein interaction data, by integrating known and predicted interactions from a variety of sources. The underlying infrastructure includes a consistent body of completely sequenced genomes and exhaustive orthology classifications, based on which interaction evidence is transferred between organisms. Although primarily developed for protein interaction analysis, the resource has also been successfully applied to comparative genomics, phylogenetics and network studies, which are all facilitated by programmatic access to the database back-end and the availability of compact download files. Public version of STRING is accessible via string.embl.de
STITCH: Search Tool for Interactions of Chemicals
STITCH (stitch.embl.de) is a resource to explore known and predicted interactions of chemicals and proteins. Chemicals are linked to other chemicals and proteins by evidence derived from experiments, databases and the literature.
STITCH contains interactions for over 68,000 chemicals and over 1.5 million proteins in 373 species.
Annotation and exploration of metabolic pathways
iPath: Interactive Pathways Explorer
iPath is a web-based tool (pathways.embl.de) for the visualization and analysis of the metabolic pathways. The underlying global pathways map is constructed using approximately 120 KEGG pathways, and gives an overview of the complete metabolism in biological systems. Nodes in the map correspond to various chemical compounds and edges represent series of enzymatic reactions.
Various types of data can be mapped onto the default global map, changing the colors, opacity and width of any node or edge. In addition, iPath provides a set of pre-computed metabolic pathway maps for various species and taxnonomic classes. All maps in iPath can be easily converted to various graphical formats.
Text mining tools
MATADOR: Manually Annotated Targets and Drugs Online Resource
MATADOR is a resource for protein-chemical interactions. It differs from other resources such as DrugBank in its inclusion of as many direct and indirect interactions as we could find. In contrast, DrugBank usually contains only the main mode of interaction. The manually annotated list of direct (binding) and indirect interactions between proteins and chemicals was assembled by automated text-mining followed by manual curation. Each interaction contains links to PubMed abstracts or OMIM entries that were used to deduce the interaction. (These articles are not necessarily useful review articles.) Indirect interactions are caused by many different mechanisms. For example, binding a metabolite of a drug as well as changes in gene expression fall under that category. In order to capture as many interactions as possible, all the different mechanisms are grouped together. You as the user can decide if you rather trust only the direct interactions (with a known mechanism) or also indirect interactions.
SIDER: Side Effect Resource
The SIDER Side Effect Resource (sideeffects.embl.de) represents an effort to aggregate dispersed public information on side effects.
SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. The available information include side effect frequency, drug and side effect classifications as well as links to further information, for example drug-target relations.
Phylogenetic tree annotation
iTOL: Interactive Tree Of Life
Interactive Tree Of Life is web-based tool (itol.embl.de) for the display and manipulation of phylogenetic trees. It provides most of the features available in other tree viewers, and offers a novel circular tree layout, which makes it easy to visualize mid-sized tree (up to several thousand leaves). Trees can be exported to several graphical formats, both bitmap and vector based.
iTOL is one of the first viewers which can annotate the trees with various types of additional data. Many dataset types are supported, from simple and stacked bar charts to pie-charts, animated time-series and protein domains. Additional dataset types and display options can be developed according to your needs.