We apply techniques from computer science, maths and stats -- Statistical inference, information theory,
Data structures and Algorithms, Combinatorial Optimisation etc. -- to address key computational challenges that arise in biological data, predominantly in those involving protein 3D structures and 1D sequences.
Some open-source programs and web utilities developed at LCB
Proçodic is an interactive web server capturing the dictionary of topologically conserved secondary structures (a.k.a concepts), that forms the architectural 'basis set' of the observered universe of protein structures.
MMLigner is a command-line program and webserver to infer pairwise protein 3D structure alignments and also identify closely-competing alignments. It uses the MML-based statistical inference framework supported by probability distributions on 3D spheres.
seqMMLigner is a command line program to infer alignments between amino acid SEQUENCES under the MML framework.
Dinithi won the 2019 Ian Lawson Van Toch Memorial Award (Outstanding Student Paper) for this work.
SST is a web server to assign secondary structure to protein coordinate data using the Bayesian method of Minimum Message Length inference. Identifies helices, turns of various types, and strands of a sheet.
MUSTANG is a command line program to produce multiple structural alignments given the three-dimensional coordinates of proteins.
Super is a web server and a program to rapidly screen the entire (up-to-date) PDB and identify similar oligopeptide fragments. The method mathematically guarantees to find all superposable fragments for a given query that fits within a user-prescribed threshold of root-mean-squared deviation (RMSD).
Superpose3D is a C++ library that supports least-squares superposition of 3D vector sets. This library implements sufficient statistics for this superposition problem, and allows updating existing superpositions (under vector set addition and symmetric difference) in constant time.
Members and collaborators