CREDO: A Protein–Ligand Interaction Database for Drug Discovery
Harnessing data from the growing number of protein–ligand complexes in the Protein Data Bank is an important task in drug discovery. In order to benefit from the abundance of three-dimensional structures, structural data must be integrated with sequence as well as chemical data and the protein–small molecule interactions characterized structurally at the inter-atomic level. In this study, we present CREDO, a new publicly available database of protein–ligand interactions, which represents contacts as structural interaction fingerprints, implements novel features and is completely scriptable through its application programming interface. Features of CREDO include implementation of molecular shape descriptors with ultrafast shape recognition, fragmentation of ligands in the Protein Data Bank, sequence-to-structure mapping and the identification of approved drugs. Selected analyses of these key features are presented to highlight a range of potential applications of CREDO. The CREDO dataset has been released into the public domain together with the application programming interface under a Creative Commons license at http://www-cryst.bioc.cam.ac.uk/credo. We believe that the free availability and numerous features of CREDO database will be useful not only for commercial but also for academia-driven drug discovery programmes.