Tuesday, February 16, 2010

Widespread RNA Editing of Embedded Alu Elements in the Human Transcriptom

More than one million copies of the ∼300-bp Alu element are interspersed throughout the human genome, with up to 75% of all known genes having Alu insertions within their introns and/or UTRs. Transcribed Alu sequences can alter splicing patterns by generating new exons, but other impacts of intragenic Alu elements on their host RNA are largely unexplored. Recently, repeat elements present in the introns or 3′-UTRs of 15 human brain RNAs have been shown to be targets for multiple adenosine to inosine (A-to-I) editing. Using a statistical approach, we find that editing of transcripts with embedded Alu sequences is a global phenomenon in the human transcriptome, observed in 2674 (∼2%) of all publicly available full-length human cDNAs (n = 128,406), from >250 libraries and >30 tissue sources. In the vast majority of edited RNAs, A-to-I substitutions are clustered within transcribed sense or antisense Alu sequences. Edited bases are primarily associated with retained introns, extended UTRs, or with transcripts that have no corresponding known gene. Therefore, Alu-associated RNA editing may be a mechanism for marking nonstandard transcripts, not destined for translation.

Novel noncoding RNA

The human Y chromosome, because it is enriched in repetitive DNA, has been very intractable to genetic and molecular analyses. There is no previous evidence for developmental stage- and testis-specific transcription from the male-specific region of the Y (MSY). Here, we present evidence for the first time for a developmental stage- and testis-specific transcription from MSY distal heterochromatic block. We isolated two novel RNAs, which localize to Yq12 in multiple copies, show testis-specific expression, and lack active X-homologs. Experimental evidence shows that one of the above Yq12 noncoding RNAs (ncRNAs) trans-splices with CDC2L2 mRNA from chromosome 1p36.3 locus to generate a testis-specific chimeric β sv13 isoform. This 67-nt 5′UTR provided by the Yq12 transcript contains within it a Y box protein-binding CCAAT motif, indicating translational regulation of the β sv13 isoform in testis. This is also the first report of trans-splicing between a Y chromosomal and an autosomal transcript.

Genomic localization of RNA binding proteins reveals links between pre-mRNA processing and transcription

Pre-mRNA processing often occurs in coordination with transcription thereby coupling these two key regulatory events. As such, many proteins involved in mRNA processing associate with the transcriptional machinery and are in proximity to DNA. This proximity allows for the mapping of the genomic associations of RNA binding proteins by chromatin immunoprecipitation (ChIP) as a way of determining their sites of action on the encoded mRNA. Here, we used ChIP combined with high-density microarrays to localize on the human genome three functionally distinct RNA binding proteins: the splicing factor polypyrimidine tract binding protein (PTBP1/hnRNP I), the mRNA export factor THO complex subunit 4 (ALY/THOC4), and the 3′ end cleavage stimulation factor 64 kDa (CSTF2). We observed interactions at promoters, internal exons, and 3′ ends of active genes. PTBP1 had biases toward promoters and often coincided with RNA polymerase II (RNA Pol II). The 3′ processing factor, CSTF2, had biases toward 3′ ends but was also observed at promoters. The mRNA processing and export factor, ALY, mapped to some exons but predominantly localized to introns and did not coincide with RNA Pol II. Because the RNA binding proteins did not consistently coincide with RNA Pol II, the data support a processing mechanism driven by reorganization of transcription complexes as opposed to a scanning mechanism. In sum, we present the mapping in mammalian cells of RNA binding proteins across a portion of the genome that provides insight into the transcriptional assembly of RNA–protein complexes.

Monday, February 15, 2010

A systematic analysis of intronic sequences downstream

To identify human intronic sequences associated with 5′ splice site recognition, we performed a systematic search for motifs enriched in introns downstream of both constitutive and alternative cassette exons. Significant enrichment was observed for U-rich motifs within 100 nucleotides downstream of 5′ splice sites of both classes of exons, with the highest enrichment between positions +6 and +30. Exons adjacent to U-rich intronic motifs contain lower frequencies of exonic splicing enhancers and higher frequencies of exonic splicing silencers, compared with exons not followed by U-rich intronic motifs. These findings motivated us to explore the possibility of a widespread role for U-rich motifs in promoting exon inclusion. Since cytotoxic granule-associated RNA binding protein (TIA1) and TIA1-like 1 (TIAL1; also known as TIAR) were previously shown in vitro to bind to U-rich motifs downstream of 5′ splice sites, and to facilitate 5′ splice site recognition in vitro and in vivo, we investigated whether these factors function more generally in the regulation of splicing of exons followed by U-rich intronic motifs. Simultaneous knockdown of TIA1 and TIAL1 resulted in increased skipping of 36/41 (88%) of alternatively spliced exons associated with U-rich motifs, but did not affect 32/33 (97%) alternatively spliced exons that are not associated with U-rich motifs. The increase in exon skipping correlated with the proximity of the first U-rich motif and the overall “U-richness” of the adjacent intronic region. The majority of the alternative splicing events regulated by TIA1/TIAL1 are conserved in mouse, and the corresponding genes are associated with diverse cellular functions. Based on our results, we estimate that ∼15% of alternative cassette exons are regulated by TIA1/TIAL1 via U-rich intronic elements.

Alternative splicing of anciently exonized 5S rRNA

Identifying conserved alternative splicing (AS) events among evolutionarily distant species can prioritize AS events for functional characterization and help uncover relevant cis- and trans-regulatory factors. A genome-wide search for conserved cassette exon AS events in higher plants revealed the exonization of 5S ribosomal RNA (5S rRNA) within the gene of its own transcription regulator, TFIIIA (transcription factor for polymerase III A). The 5S rRNA-derived exon in TFIIIA gene exists in all representative land plant species but not in green algae and nonplant species, suggesting it is specific to land plants. TFIIIA is essential for RNA polymerase III-based transcription of 5S rRNA in eukaryotes. Integrating comparative genomics and molecular biology revealed that the conserved cassette exon derived from 5S rRNA is coupled with nonsense-mediated mRNA decay. Utilizing multiple independent Arabidopsis overexpressing TFIIIA transgenic lines under osmotic and salt stress, strong accordance between phenotypic and molecular evidence reveals the biological relevance of AS of the exonized 5S rRNA in quantitative autoregulation of TFIIIA homeostasis. Most significantly, this study provides the first evidence of ancient exaptation of 5S rRNA in plants, suggesting a novel gene regulation model mediated by the AS of an anciently exonized noncoding element.

Genome-wide mapping of alternative splicing in Arabidopsis thaliana

Alternative splicing can enhance transcriptome plasticity and proteome diversity. In plants, alternative splicing can be manifested at different developmental stages, and is frequently associated with specific tissue types or environmental conditions such as abiotic stress. We mapped the Arabidopsis transcriptome at single-base resolution using the Illumina platform for ultrahigh-throughput RNA sequencing (RNA-seq). Deep transcriptome sequencing confirmed a majority of annotated introns and identified thousands of novel alternatively spliced mRNA isoforms. Our analysis suggests that at least ∼42% of intron-containing genes in Arabidopsis are alternatively spliced; this is significantly higher than previous estimates based on cDNA/expressed sequence tag sequencing. Random validation confirmed that novel splice isoforms empirically predicted by RNA-seq can be detected in vivo. Novel introns detected by RNA-seq were substantially enriched in nonconsensus terminal dinucleotide splice signals. Alternative isoforms with premature termination codons (PTCs) comprised the majority of alternatively spliced transcripts. Using an example of an essential circadian clock gene, we show that intron retention can generate relatively abundant PTC+ isoforms and that this specific event is highly conserved among diverse plant species. Alternatively spliced PTC+ isoforms can be potentially targeted for degradation by the nonsense mediated mRNA decay (NMD) surveillance machinery or regulate the level of functional transcripts by the mechanism of regulated unproductive splicing and translation (RUST). We demonstrate that the relative ratios of the PTC+ and reference isoforms for several key regulatory genes can be considerably shifted under abiotic stress treatments. Taken together, our results suggest that like in animals, NMD and RUST may be widespread in plants and may play important roles in regulating gene expression.

specific alternative splicing in primates

Comparative studies of gene regulation suggest an important role for natural selection in shaping gene expression patterns within and between species. Most of these studies, however, estimated gene expression levels using microarray probes designed to hybridize to only a small proportion of each gene. Here, we used recently developed RNA sequencing protocols, which sidestep this limitation, to assess intra- and interspecies variation in gene regulatory processes in considerably more detail than was previously possible. Specifically, we used RNA-seq to study transcript levels in humans, chimpanzees, and rhesus macaques, using liver RNA samples from three males and three females from each species. Our approach allowed us to identify a large number of genes whose expression levels likely evolve under natural selection in primates. These include a subset of genes with conserved sexually dimorphic expression patterns across the three species, which we found to be enriched for genes involved in lipid metabolism. Our data also suggest that while alternative splicing is tightly regulated within and between species, sex-specific and lineage-specific changes in the expression of different splice forms are also frequent. Intriguingly, among genes in which a change in exon usage occurred exclusively in the human lineage, we found an enrichment of genes involved in anatomical structure and morphogenesis, raising the possibility that differences in the regulation of alternative splicing have been an important force in human evolution.

Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts

Metazoan genes are encrypted with at least two superimposed codes: the genetic code to specify the primary structure of proteins and the splicing code to expand their proteomic output via alternative splicing. Here, we define the specificity of a central regulator of pre-mRNA splicing, the conserved, essential splicing factor SFRS1. Cross-linking immunoprecipitation and high-throughput sequencing (CLIP-seq) identified 23,632 binding sites for SFRS1 in the transcriptome of cultured human embryonic kidney cells. SFRS1 was found to engage many different classes of functionally distinct transcripts including mRNA, miRNA, snoRNAs, ncRNAs, and conserved intergenic transcripts of unknown function. The majority of these diverse transcripts share a purine-rich consensus motif corresponding to the canonical SFRS1 binding site. The consensus site was not only enriched in exons cross-linked to SFRS1 in vivo, but was also enriched in close proximity to splice sites. mRNAs encoding RNA processing factors were significantly overrepresented, suggesting that SFRS1 may broadly influence the post-transcriptional control of gene expression in vivo. Finally, a search for the SFRS1 consensus motif within the Human Gene Mutation Database identified 181 mutations in 82 different genes that disrupt predicted SFRS1 binding sites. This comprehensive analysis substantially expands the known roles of human SR proteins in the regulation of a diverse array of RNA transcripts.