Comparative analysis of core promoter region: Information content from mono and dinucleotide substitution matrices

Published in Computational Biology & Chemistry, 2006

Recommended citation: Ashok Reddy D, Prasad B V L S and Mitra C K. (2006). "Comparative analysis of core promoter region: Information content from mono and dinucleotide substitution matrices." Comput Biol Chem. 30, 58-62. http://adinasarapu.github.io/files/comp2006.pdf

We have studied the core promoter region in five sets of promoter sequences by calculating the average mutual information content H (relative entropy). We have used specially constructed substitution matrices to calculate mono and dinucleotide replacements in a given block of aligned sequences. These substitution matrices use log-odds form of scores, which are in bits of information. Here, we constructed and applied nucleotide substitution matrices for the core promoter region to calculate the information content to study the Transcription Start Site (TSS), TATA-box and downstream regions. As expected, the information content decreases with increasing block size. This clearly implies that the TSS region is likely to be 5-10 bases in size (length). We also notice that both in the case of mouse and humans, both TATA-boxes and TSS regions are likely to play important roles in proper transcriptional initiation.

Download paper here