Abstract:
Abstract
Gene transcription mediated by RNA polymerase II (pol-II) is a key step in gene expression. The
dynamics of pol-II moving along the transcribed region in
uence the rate and timing of gene expression.
In this work we present a probabilistic model of transcription dynamics which is tted to pol-II occupancy
time course data measured using ChIP-Seq. The model can be used to estimate transcription speed and
to infer the temporal pol-II activity pro le at the gene promoter. Model parameters are estimated
using either maximum likelihood estimation or via Bayesian inference using Markov chain Monte Carlo
sampling. The Bayesian approach provides con dence intervals for parameter estimates and allows the
use of priors that capture domain knowledge, e.g. the expected range of transcription speeds, based on
previous experiments. The model describes the movement of pol-II down the gene body and can be used
to identify the time of induction for transcriptionally engaged genes. By clustering the inferred promoter
activity time pro les, we are able to determine which genes respond quickly to stimuli and group genes
that share activity pro les and may therefore be co-regulated. We apply our methodology to biological
data obtained using ChIP-seq to measure pol-II occupancy genome-wide when MCF-7 human breast
cancer cells are treated with estradiol (E2). The transcription speeds we obtain agree with those obtained
previously for smaller numbers of genes with the advantage that our approach can be applied genomewide.
We validate the biological signi cance of the pol-II promoter activity clusters by investigating
cluster-speci c transcription factor binding patterns and determining canonical pathway enrichment. We
nd that rapidly induced genes are enriched for both estrogen receptor alpha (ER ) and FOXA1 binding
in their proximal promoter regions.