My new blogpost in Kolabtree blog: https://blog.kolabtree.com/applications-of-machine-learning-in-biology/
My new blogpost in Kolabtree blog: https://blog.kolabtree.com/applications-of-machine-learning-in-biology/
This paper was part of my journal club recently. I touched upon LPMOs, short for Lytic Polysaccharide Monooxygenases, in my previous post that are basically oxidative enzymes.
These interesting group of enzymes have three basic types: Type I, Type II, and Type III, classified based on the site of attack, namely LPMO1 (Type I) when oxidation occurs at C1 carbon, LPMO2 (Type II) when oxidation occurs at C4 carbon, and LPMO3 (Type III) if either C1 or C4 carbons are attacked. These subtypes are part of four CAZy families to which LPMOs are categorized into (AA9, AA10, AA11, and AA13).
Having said this, the identification of the types in LPMO is not a trivial task. This specificity to cleave a particular bond, or regiospecificity, is characterized by time-consuming chromatography experiments (HPAEC-PAD), as they are time course studies that involve incubating with the substrate for longer periods. If aldonic acids are discovered in the experiments, then it is C1 cleaving or Type I; and if 4-gemdiol-aldose is detected than it is C4 cleaving of Type II LPMOs.
Given this complex identification protocol, any shortcut to identify the regiospecificity is welcome and that’s what Danneels et al have attempted in their paper published in PLOS ONE. Specifically, using an indicator diagram based identification, they give a solution to identify regiospecificity.
To test they used Hypocrea jecorina‘s LPMO9A (having both C1/C4 cleavage) and did site-directed mutagenesis on key aromatic residues that are involved in substrate binding to create mutants that are either selective to C1 or C4 cleavage. Comparing the activity of the wildtype with the mutants by plotting the speed of release of aldonic acids with respect to 4-gemdiol-aldose the authors plot it as an indicator diagram. Basically, if one calculate the slope of the line, and it is closer to x-axis (release of aldonic acid) then the enzyme’s regiospecificity is for C1 oxidation or consisting of Type I LPMO activity. If closer to y-axis, then Type II LPMO activity.
It would be interesting to see this type of indicator diagram applied for enzyme activity identification for new LPMO enzymes, and also for enzyme engineering studies on LPMO.
Reference: B. Danneels, M. Tanghe, H. Joosten, T. Gundinger, O. Spadiut, I. Stals and T. Desmet, “A quantitative indicator diagram for lytic polysaccharide monooxygenases reveals the role of aromatic surface residues in HjLPMO9A regioselectivity“, 2017. .
Enzyme discovery is always a hot topic for industry and biochemists, since there is huge commercial benefit and advancement in current knowledge-base. Unlike, early days where enzyme discovery relied upon assays, now it is a bioinformatics approach to speed up the process.
Lytic polysaccharide monooxygenases (LPMO) are the latest family of enzymes that have affected the biofuel industry, specifically cellulosic bioethanol industry. These enzymes, previously thought to bind to polymers of carbohydrates, are now understood to boost the enzymatic process of degrading recalcitrant crystalline polysaccharides. LPMOs are probably one of the enzymes that bind to crystalline surfaces of polymers and thus hopefully reducing the cost and time in the pre-processing step of biomass to bioethanol conversion.
As of now, there are 4 families that CAZy identifies to consist of LPMOs (AA9, AA10, AA11, and AA13), where they are now called as Auxillary Activity enzymes. However, there is the possibility that newer families of LPMOs may exist and it depends on how or when we identify them. Voshol et al in their recent paper discuss a bionnformatics based approach and validation using expression data for discovery of novel LPMO families, and report the existence of “LPMO14” family that are active on sugars such as beat-1,3-glucans.
In short, they took 14 known LPMOs from the four families and generated a HMM profile, using which they scanned six genomes and identified 7 LPMO14 genes. They were also able to find LPMO genes in other organisms (such as Drosophila, Bivalves, corals, etc) where the function of LPMO is not known.
While, this data sounds promising, further identification and characterization using standard assays would ensure that this is indeed a new family of LPMOs. Also, the authors do not mention as to why they stuck to 14 known LPMO structures to generate the HMM profile, while there are nearly 50 structures currently deposited in PDB.
Reference: G. Voshol, E. Vijgenboom and P. Punt, “The discovery of novel LPMO families with a new Hidden Markov model“, BMC Research Notes, vol. 10, no. 1, 2017.
The area of protein-protein interactions (PPI) is always exciting as proteins can be either monogamous or promiscuous with their interaction partners. Also, a hot topic these days in computational biology is multiscale modelling. It refers to the method of analyzing a system from atomistic and at a global scale and other scales, in between. Few years ago, a new method of co-evolutionary analysis of residues made news and has been used for gaining insights in other systems. Read the blog by Bosco Ho about the groundbreaking work here about Direct Coupling Analysis (DCA).
So, when two orthogonal approaches are used to understand a PPI system, it obviously becomes an interesting work. A recent paper in eLife reports exactly that. Malinverni et al report the use of coarse-grained simulation coupled with atomistic molecular dynamics simulation and data from DCA to identify the evolutionarily conserved residues that cause the specific interaction between Hsp70 and Hsp40.
As these two proteins are part of the chaperone machinery, any insight on the Hsp70’s ability to bind to proteins along with Hsp40 is crucial to understand the short-lived interaction. As mentioned in the article, vast amount of data (in terms of NMR, mutagenesis, etc.) is present that can be incorporated in this multiscale modelling.
The impact of the predicted interaction model is not only statistically significant, but also correlates well with the known experimental data.
Malinverni D, Jost Lopez A, De Los Rios P, Hummer G, Barducci A: Modeling Hsp70/Hsp40 interaction by multi-scale molecular simulations and coevolutionary sequence analysis. eLife 2017, 6. DOI: https://doi.org/10.7554/eLife.23471
Intrinsically disordered proteins are thought to be fully functional, yet do not confirm to a single conformation, thereby identifying their structure via crystallography becomes problematic. Many intrinsically disordered proteins have been studied and analyzed using NMR methods, however the question as to why proteins are intrinsically disordered is still debatable.
While, viewing X-ray diffraction data some residues do not have an electron density region, thus they are marked as missing residues. These regions are highly mobile and are considered as intrinsically disordered. For some proteins, the entire sequence is considered intrinsically disordered.
It is a widely accepted fact that sequence dictates structure, and structure in turn dictates function. So, is the “disordered-ness” encoded in the genome, if so to what extent? This and related questions have led Basile et al at the Stockholm University, Sweden to delve deeper and have narrowed it down to GC content. Their work has been published in latest issue of PLoS Computational Biology.
Using computational methods they analyzed 400 eukaryotic genomes and looked into the so-called orphan genes, specifically. They categorized the age of the proteins using ProteinHistorian tool and looked into the old and young proteins. They found that the
…selective pressure to change amino acids in a protein is stronger than the one to change the GC content. At low GC ancient proteins are more disordered than expected for random sequence while at high GC they are less.
The three disorder promoting amino acids (Ala, Pro, and Gly) are high in GC content w.r.t to their codons. However,
At high GC the youngest proteins become more disordered and contain less secondary structure elements, while at low GC the reverse is observed. We show that these properties can be explained by changes in amino acid frequencies caused by the different amount of GC in different codons.
With increasing computational power (aka GPU) that can be accessed these days, it is no wonder that performing all-atom molecular dynamics simulation for a longer time, with duplicates and/or triplicates, has become easier.
With these data it should pave way for more insights.
CRISPR-Cas9 all atom simulation (total of 400-600ns data)
Zuo Z, & Liu J (2016). Cas9-catalyzed DNA Cleavage Generates Staggered Ends: Evidence from Molecular Dynamics Simulations. Scientific reports, 5 PMID: 27874072
Entire Dengue viral envelope complex simluation (1 microsecond data)
Marzinek JK, Holdbrook DA, Huber RG, Verma C, & Bond PJ (2016). Pushing the Envelope: Dengue Viral Membrane Coaxed into Shape by Molecular Simulations. Structure (London, England : 1993), 24 (8), 1410-20 PMID: 27396828
Wanted to share this exciting news with you! Biophysical Journal has created a collection of papers that describe tools and software that can be routinely used in biological research. Editor Prof. Leslie Loew mentions that the full-text of articles in this collection will be freely available until February 25, 2016.
…a year agoBiophysical Journal called for papers in a new class of articles called Computational Tools (CTs). These papers are limited to five pages in length and describe software for analysis of experimental data, modeling and/or simulation, or database services. All are required to be freely accessible and open to the research community. In addition to following the usual review criteria of novelty and importance, reviewers of CTs are asked to test drive the software and judge its usability.
Among the thirteen, some of them are directly related to Structural Biology and Bioinformatics. So, here’s my “curated” list.
CHARMM-GUI HMMM Builder for Membrane Simulations with the Highly Mobile Membrane-Mimetic Model
Slow diffusion of the lipids in conventional all-atom simulations of membrane systems makes it difficult to sample large rearrangements of lipids and protein-lipid interactions. Recently, Tajkhorshid and co-workers developed the highly mobile membrane-mimetic (HMMM) model with accelerated lipid motion by replacing the lipid tails with small organic molecules. The HMMM model provides accelerated lipid diffusion by one to two orders of magnitude, and is particularly useful in studying membrane-protein associations. However, building an HMMM simulation system is not easy, as it requires sophisticated treatment of the lipid tails. In this study, we have developed CHARMM-GUI HMMM Builder (http://www.charmm-gui.org/input/hmmm) to provide users with ready-to-go input files for simulating HMMM membrane systems with/without proteins. Various lipid-only and protein-lipid systems are simulated to validate the qualities of the systems generated by HMMM Builder with focus on the basic properties and advantages of the HMMM model. HMMM Builder supports all lipid types available in CHARMM-GUI and also provides a module to convert back and forth between an HMMM membrane and a full-length membrane. We expect HMMM Builder to be a useful tool in studying membrane systems with enhanced lipid diffusion.
As molecular dynamics (MD) simulations continue to evolve into powerful computational tools for studying complex biomolecular systems, the necessity of flexible and easy-to-use software tools for the analysis of these simulations is growing. We have developed MDTraj, a modern, lightweight, and fast software package for analyzing MD simulations. MDTraj reads and writes trajectory data in a wide variety of commonly used formats. It provides a large number of trajectory analysis capabilities including minimal root-mean-square-deviation calculations, secondary structure assignment, and the extraction of common order parameters. The package has a strong focus on interoperability with the wider scientific Python ecosystem, bridging the gap between MD data and the rapidly growing collection of industry-standard statistical analysis and visualization tools in Python. MDTraj is a powerful and user-friendly software package that simplifies the analysis of MD data and connects these datasets with the modern interactive data science software ecosystem in Python.
We introduce a web portal that employs network theory for the analysis of trajectories from molecular dynamics simulations. Users can create protein energy networks following methodology previously introduced by our group, and can identify residues that are important for signal propagation, as well as measure the efficiency of signal propagation by calculating the network coupling. This tool, called MDN, was used to characterize signal propagation in Escherichia coli heat-shock protein 70-kDa. Two variants of this protein experimentally shown to be allosterically active exhibit higher network coupling relative to that of two inactive variants. In addition, calculations of partial coupling suggest that this quantity could be used as part of the criteria to determine pocket druggability in drug discovery studies.
Homology modeling predicts protein structures using known structures of related proteins as templates. We developed MULTIDOMAIN ASSEMBLER (MDA) to address the special problems that arise when modeling proteins with large numbers of domains, such as fibronectin with 30 domains, as well as cases with hundreds of templates. These problems include how to spatially arrange nonoverlapping template structures, and how to get the best template coverage when some sequence regions have hundreds of available structures while other regions have a few distant homologs. MDA automates the tasks of template searching, visualization, and selection followed by multidomain model generation, and is part of the widely used molecular graphics package UCSF CHIMERA (University of California, San Francisco). We demonstrate applications and discuss MDA’s benefits and limitations.
Coarse-grained (CG) models in molecular dynamics (MD) are powerful tools to simulate the dynamics of large biomolecular systems on micro- to millisecond timescales. However, the CG model, potential energy terms, and parameters are typically not transferable between different molecules and problems. So parameterizing CG force fields, which is both tedious and time-consuming, is often necessary. We present RedMDStream, a software for developing, testing, and simulating biomolecules with CG MD models. Development includes an automatic procedure for the optimization of potential energy parameters based on metaheuristic methods. As an example we describe the parameterization of a simple CG MD model of an RNA hairpin.
Protein-protein docking programs can give valuable insights into the structure of protein complexes in the absence of an experimental complex structure. Web interfaces can facilitate the use of docking programs by structural biologists. Here, we present an easy web interface for protein-protein docking with the ATTRACT program. While aimed at nonexpert users, the web interface still covers a considerable range of docking applications. The web interface supports systematic rigid-body protein docking with the ATTRACT coarse-grained force field, as well as various kinds of protein flexibility. The execution of a docking protocol takes up to a few hours on a standard desktop computer.
ReaDDy is a modular particle simulation package combining off-lattice reaction kinetics with arbitrary particle interaction forces. Here we present a graphical processing unit implementation of ReaDDy that employs the fast multiplatform molecular dynamics package OpenMM. A speedup of up to two orders of magnitude is demonstrated, giving us access to timescales of multiple seconds on single graphical processing units. This opens up the possibility of simulating cellular signal transduction events while resolving all protein copies.
Diffusion and interaction of molecular regulators in cells is often modeled using reaction-diffusion partial differential equations. Analysis of such models and exploration of their parameter space is challenging, particularly for systems of high dimensionality. Here, we present a relatively simple and straightforward analysis, the local perturbation analysis, that reveals how parameter variations affect model behavior. This computational tool, which greatly aids exploration of the behavior of a model, exploits a structural feature common to many cellular regulatory systems: regulators are typically either bound to a membrane or freely diffusing in the interior of the cell. Using well-documented, readily available bifurcation software, the local perturbation analysis tracks the approximate early evolution of an arbitrarily large perturbation of a homogeneous steady state. In doing so, it provides a bifurcation diagram that concisely describes various regimes of the model’s behavior, reducing the need for exhaustive simulations to explore parameter space. We explain the method and provide detailed step-by-step guides to its use and application.