Databases

CREDO

A protein-ligand interaction database for drug discovery

Author
Adrian Schreyer
Abstract

Structural<br />
Interaction Fingerprint (SIFt) for GleevecHarnessing data from the growing number of protein-ligand complexes in the Protein Data Bank (PDB) is an important task in drug discovery. In order to benefit from the abundance of three-dimensional structures, structural data must be integrated with sequence as well as chemical data and the protein-small molecule interactions characterised structurally at the inter-atomic level. Here, we present CREDO, a new publicly available database of protein-ligand interactions, which represents contacts as structural interaction fingerprints, implements novel features and is completely scriptable through its application programming interface (API). Features of CREDO include implementation of molecular shape descriptors with Ultrafast Shape Recognition (USR), fragmentation of ligands in the PDB, sequence-to-structure mapping and the identification of approved drugs. Selected analyses of these key features are presented to highlight a range of potential applications of CREDO. We believe that the free availability and numerous features of CREDO database will be useful not only for commercial but also for academia-driven drug discovery programmes.

TIMBAL

TIMBAL holds small molecules disrupting protein-protein interactions

Author
Alicia Higueruelo
Abstract

To date, drug discovery has focused in the main part on a handful of targets that meet the "classical" druggable criteria: being linked to disease and having a beautiful pocket able to bind a small drug-like molecule (Hopkins and Groom, 2002). However the successes of monoclonal antibodies as therapeutic agents are changing the perspective of what makes a drug target. These biologicals disrupt multi-protein complexes, which are the key points of almost all processes in living organisms. Still, antibodies being expensive and difficult to administer are not the ideal drugs, and they target only extra cellular molecules. In recent years, growing evidence of the possibility of modulation of protein-protein interactions by small molecules is opening the door for new approaches and concepts in drug discovery (Whitty and Kumaravel, 2006). TIMBAL aims to be a resource which will give insights in the type of molecules favoured by protein interfaces and in the type of interactions these systems present. TIMBAL is a database containing small molecules that modulate protein-protein interactions. It was first created in 2008, by manually curating information extracted from relevant scientific publications. An analysis of the data was published in 2009, (Higueruelo et al, 2009). The growth of data in the past years makes hand-curated databases a phenomenally time-consuming task. The maintenance of TIMBAL is done now through automated searches on the ChEMBL database. The list of known PPI targets and its orthologs has been transtaled into UniProt codes. These codes are then used in ChEMBL for searching small molecule data related to these proteins in binding assays with confidence that the assay is directly assigned to a single protein or its homolog.

Web service
An interface to the database can be found here.

BIPA

Biological database for Interation between Protein and nucleic Acid

Author:
Semin Lee and Tom L. Blundell
Abstract:
BIPA is a database for protein-nucleic acid interactions in three-dimensional structures. The database provides various physicochemical features of protein-nucleic acid interface such as size, shape, residue propensity, secondary composition, and intermolecular atomic interactions. The database also contains multiple structural alignments of nucleic-acid binding protein families with annotations of local environments in order to allow definition of features that influence acceptability of mutations at a particular position in a protein family. A web interface has been designed to present the results of these analyses and facilitate navigation of protein-nucleic acid interfaces.
Web service
An interface to the database can be found here.

PICCOLO

Structurally-characterized protein-protein interactions described at atomic level

Author
Richard Bickerton
Abstract

The sequencing of the human genome provides the parts list for understanding cellular processes. However, as 70% of eukaryotic genes workthrough multi-protein systems, it is only through detailed study of the interactions of these components, that a more complete, systems-level understanding can be gained. PICCOLO is a comprehensive database of structurally characterized protein interactions that enables a variety of analyses to be performed concerning interface properties including residue propensity, hydropathy, polarity, interface size, sequence entropy and residue contact preference.

Web service
An interface to the database can be found here.

Homstrad

Homologous Structure Alignment Database

Author:
Kenji Mizuguchi, Paul de Bakker, Charlotte Deane, Jiye Shi, Hiroki Shirai, Ricardo Nunez, Tom Blundell and John Overington
Abstract:
HOMSTRAD (HOMologous STRucture Alignment Database) is a curated database of structure-based alignments for homologous protein families. All known protein structure are clustered into homologous families (i.e., common ancestry), and the sequences of representative members of each family are aligned on the basis of their 3D structures using the programs MNYFIT, STAMP and COMPARER. These structure-based alignments are annotated with JOY and examined individually.

The database provides annotated structural alignments in various formats, superimposed structures, homologous SWISS-PROT/TrEMBL sequences, PROSITE annotation, links to other databases (Pfam, SMART) and the alignment and search interface to the program FUGUE.

ESST

Environment Specific Substitution Table

Author:
Overington J, Donnelly D, Johnson MS, Sali A, Blundell TL
Abstract:
The local environment of an amino acid in a folded protein determines the acceptability of mutations at that position. In order to characterize and quantify these structural constraints, we have made a comparative analysis of families of homologous proteins. Residues in each structure are classified according to amino acid type, secondary structure, accessibility of the side chain, and existence of hydrogen bonds from the side chains. Analysis of the pattern of observed substitutions as a function of local environment shows that there are distinct patterns, especially for buried polar residues. The substitution data tables are available on diskette with Protein Science. Given the fold of a protein, one is able to predict sequences compatible with the fold (profiles or templates) and potentially to discriminate between a correctly folded and misfolded protein. Conversely, analysis of residue variation across a family of aligned sequences in terms of substitution profiles can allow prediction of secondary structure or tertiary environment.