Research

Our current research spans three areas:

  1. Sequence recognition of transcription factors - Transcription factors (TFs) are proteins that bind specific sites in the genome in order to regulate the expression of nearby genes. This binding, which is stochastic kinetic process, depends on many variables, including the DNA sequence of the target site and cooperative interactions with ‘co-factors’ binding nearby sites. Within this research area, we develop high-throughput binding assays and matched machine learning methods to quantify how proteins (such as TFs) read genetic sequences. Specifically, by using a multi-layered model (analogous to ‘deep learning’) that is tailored to describe the sequencing assay, we quantify sequence recognition in terms of the underlying biophysical kinetic and equilibrium parameters.

    References:
    1. "Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning", H. T. Rube, C. Rastogi, S. Feng, J. F. Kribelbauer, A. Li, B. Becerra, L. A. N. Melo, B. V. Do, X. Li, H. H. A., N. H. Shah, R. S. Mann & H. J. Bussemaker, Nature Biotechnology, Link
    2. "Accurate and sensitive quantification of protein-DNA binding affinity", C. Rastogi, H. T. Rube, J. F. Kribelbauer, J. Crocker, R. E. Loker, G. D. Martini, O. Laptenko, W. A. Freed-Pastor, C. Prives, D. L. Stern, R. S. Mann & H. J. Bussemaker, PNAS, Link
    3. "The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer", R J. A. Bell, H. T. Rube, A. Kreig, A. Mancini, S. D. Fouse, R. P. Nagarajan, S. Choi, C. Hong, D. He, M. Pekmezci, J. K. Wiencke, M. R. Wrensch, S. M. Chang, K. M. Walsh, S. Myong, J. S. Song and J. F. Costello, Science, Link
    4. "A unified approach for quantifying and interpreting DNA shape readout by transcription factors", H. T. Rube, C. Rastogi, J. F, Kribelbauer & H. J. Bussemaker, Molecular Systems Biology, Link
       
  2. Chromatin organization - The DNA is bound by a wide range of proteins, thus forming a complex called chromatin. The structure of chromatin varies dramatically across the genome, ranging from an open state (where the DNA is accessible) to a highly repressed state (where DNA tightly packed). The most basic unit of DNA packing is the nucleosome, which consist of a 146 base pairs of DNA wrapping a ball of proteins (called histones). In the last decade, high-throughput DNA sequencing has allowed researchers to create a wide range ‘functional genomics’ assay that create genome-wide maps of nucleosomes, transcription factors, etc. In this research area, we develop algorithm for analyzing functional genomics data in order to quantify chromatin organization, understand how it is formed.

    References:
    1. "Quantifying the role of steric constraints in nucleosome positioning", H. T. Rube, J. S. Song,  Nucleic Acids Research, Link
    2. "Categorical spectral analysis of periodicity in nucleosomal DNA",  H. Jin, H. T. Rube, J. S. Song, Nucleic Acids Research, Link
       
  3. CRISPR systems - Although the CRISPR-Cas9 system revolutionized research, limited efficiency and off-target activity are hurdles to widespread therapeutic use. In an ongoing collaboration, we develop assays and mathematical models to quantify how CRISPR binding and activity depend on the DNA (target) and gRNA (programmable) sequences.

    References:
    1. "Systematic in vitro profiling of off-target affinity, cleavage and efficiency for CRISPR enzymes", L. Zhang, H. T. Rube, C. A. Vakulskas, M. A. Behlke, H. J. Bussemaker, M. A. Pufall, Nucleic Acids Research, Link
    2. "AsCas12a ultra nuclease facilitates the rapid generation of therapeutic cell medicines", L. Zhang, J. A. Zuris, R. Viswanathan, J. N. Edelstein, R. Turk, B. Thommandru, H. T. Rube, S. E. Glenn, M. A. Collingwood, N. M. Bode, S. F. Beaudoin, S. Lele, S. N. Scott, Kn M. Wasko, S. Sexton, C. M. Borges, M. S. Schubert, G. L. Kurgan, M. S. McNeill, C. A. Fernandez, V. E. Myer, R. A. Morgan, M. A. Behlke & C. A. Vakulskas, Nature Communications, Link