Computational Analysis of Core Promoters in the Drosophila Genome
➤ Gửi thông báo lỗi ⚠️ Báo cáo tài liệu vi phạmNội dung chi tiết: Computational Analysis of Core Promoters in the Drosophila Genome
Computational Analysis of Core Promoters in the Drosophila Genome
Computational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila Genomelecular and Cell Biology andHoward Hughes Medical Institute.University of California at Berkeley. Berkeley. CA 94720-32003Chair for Pattern Recognition (Computer Science 5)University of Erlangen-Nuremberg. Martensstrasse 3. D-91058 Erlangen•’Present address: Department of Biology. Massachusetts Inst Computational Analysis of Core Promoters in the Drosophila Genomeitute of Technology. 77 Massachusetts Ave 68-223. Cambridge. MA 02139’Corresponding author: eMail: ohlerft niit.edu FAX: 617-452-2936Running title: DrComputational Analysis of Core Promoters in the Drosophila Genome
osophila Core Promoter AnalysisKey words: computational biology. DNA sequence analysis, eukaryotic promoter recognition, gene regulation, transcriptioComputational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila Genomethe basal transcription apparatus. Drosophila TSSs have generally been mapped by individual experiments: the low number of accurately mapped TSSs has limited analysis of promoter sequence motifs and the training of computational prediction tools.ResultsWe identified TSS candidates for about 2.000 Dr Computational Analysis of Core Promoters in the Drosophila Genomeosophila genes by aligning 5' ESTs from cap-trapped cDNA libraries to the genome, while applying stringent criteria concerning coverage and 5'-end disComputational Analysis of Core Promoters in the Drosophila Genome
tribution. Examination of the sequences flanking these TSS revealed the presence of well-known core promoter motifs such as the TATA box. the initiatoComputational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila Genome what appears to be a variant DPE motif.Among the prevalent motifs is the DNA replication related element DRE, recently shown to be part of the recognition site for the TBP replacing factor TRF2. Our TSS set was then used to re-train the computational promoter predictor McPromoter. allowing US to im Computational Analysis of Core Promoters in the Drosophila Genomeprove the recognition performance to over 50% sensitivity and 40% specificity. We compare these computational results to promoter prediction in vertebComputational Analysis of Core Promoters in the Drosophila Genome
rates.ConclusionsThere are relatively few recognizable binding sites for previously known general transcription factors in Drosophila core promoters. Computational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila GenomeS prediction in Drosophila.INTRODUCTIONTranscription initiation is one of the most important control points in regulating gene expression [1.2]. Recent observations have emphasized the importance of the core promoter, a region of about 100 bp flanking the transcription start siteComputational Analysis of Core Promoters in the Drosophila Genome
RNA Polymerase II and several auxiliary factors. Core promoters show specificity both in their interactions with enhancers and with sets of general trComputational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila Genomeumber of motifs have been identified that are present in a substantial fraction. The most familiar of these motifs is the TATA box. which has been reported to be part of 30-40% of core promoters [5].Prediction and analysis of core promoters have been active areas of research in computational biology Computational Analysis of Core Promoters in the Drosophila Genome [6] with several recent publications on prediction of human promoters [7-10]. In contrast, prediction of invertebrate promoters has received much lesComputational Analysis of Core Promoters in the Drosophila Genome
s attention and has focused almost exclusively on Drosophila. Reese [11] described the application of time-delay neural networks, and in our previous Computational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila Genometures of DNA. Structural features were also examined by Levitsky and Katokhin [13], but they did not present results for promoter prediction in genomic sequences.As with computational methods for predicting the intron-exon structure of genes [14]. the computational prediction of promoters has been g Computational Analysis of Core Promoters in the Drosophila Genomereatly aided by cDNA sequence information. However, promoter prediction is complicated by the fact that most cDNA clones do not extend to the TSS. RecComputational Analysis of Core Promoters in the Drosophila Genome
ent advances in cDNA library construction methods that utilize the 5'-cap structure of mRNAs haveallowed the generation of so-called "cap-trapped" libComputational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila Genomeuences of individual cDNAs to genomic DNA [17. 18]. However, it is estimated that even in the best libraries only 50-80 % of cDNAs extend to the TSSs [16, 19], making it unreliable to base conclusions on individual cDNA alignments.We describe here a more cautious approach for identify ing TSSs that Computational Analysis of Core Promoters in the Drosophila Genomerequires the 5' ends of the alignments of multiple, independent cap-selected cDNAs to lie in close proximity. We then examined the regions flanking thComputational Analysis of Core Promoters in the Drosophila Genome
ese putative TSSs, the putative core promoter regions, for conserved DNA sequence motifs. We also used this new set of putative TSSs to retrain and siComputational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila GenomeD. meianogaster chromosomes, and discuss the different challenges of computational promoter recognition in invertebrate and vertebrate genomes.RESULTS AND DISCUSSIONSelection of EST clusters to determine transcription start sitesStapleton et al. [20] report the results of aligning 237,471 5’ EST seq Computational Analysis of Core Promoters in the Drosophila Genomeuences, including 115,169 obtained from cap-trapped libraries, on the annotated Release 2 sequence of the D. melanogasfer genome. They examined theseComputational Analysis of Core Promoters in the Drosophila Genome
alignments for alternative splice forms and grouped them into 16.744 clusters with consistent splice sites, overlapping 9,644 known protein-encoding gComputational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila Genomeap a known protein encoding gene or have evidence of splicing. (2) One of the three most 5’ ESTs in the cluster had to be derived from a cap-trapped library. (3) In some cases, disjoint clusters overlap the annotation of a single gene: here, we only considered the most 5' cluster. (4) We required th Computational Analysis of Core Promoters in the Drosophila Genomee distance to the next upstream cluster to be greater than Ikb. This requirement, together with the selection of only the most 5’cluster, leads to theComputational Analysis of Core Promoters in the Drosophila Genome
selection of only one start site per gene. By doing so. we minimize the erroneous inclusion of ESTs which are not filll-length. but also exclude alteComputational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila Genomet the 5’ends of at least 3 ESTs fall within an 11 bp window of genomic sequence, and that the number of ESTs whose 5’ ends fall within this window comprise at least ĨO°/o of the ESTs in the cluster. With a single EST we cannot be sure to have reached the true start site, even if it was generated by Computational Analysis of Core Promoters in the Drosophila Genomea method selecting for the cap site of the mRNA [17. 19]; with a cluster of ESTs within a small range, we can be more confident that we have defined tComputational Analysis of Core Promoters in the Drosophila Genome
he actual TSS. By requiring selected clusters to have at least 3 ESTs we are. however, introducing a bias against genes with low-expression levels. ThComputational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of Mol Computational Analysis of Core Promoters in the Drosophila Genomeumerical requirement is insufficiently stringent.Computational Analysis of Core Promoters in theDrosophila GenomeUwe Ohler '•■’■L Guo-chun Liao '. Heinrich Niemann \ Gerald M. Rubin'Department of MolGọi ngay
Chat zalo
Facebook