Introduction

             Alternative splicing represents an additional and under-appreciated layer of complexity underlying gene expression profiles. More recently, technological advances in library preparation methodologies enabled capturing and amplification of full-length cDNAs from single cells. Thus, paving the way for splicing analysis at single-cell resolution (Wen et al., 2020).

             High-throughput RNA-sequencing has enabled us to investigate thousands of splicing events in a single sequencing run. Consequently, bioinformatic pipelines may pick up hundreds or even thousands of candidate splicing events. Example of these candidate splicing events are splicing events that are differentially expressed between two groups of cell populations. Therefore, it is not feasible to bring forward the large volume of candidate splicing events for downstream validation and functional studies.

             One strategy to distinguish true positive from false positive splicing events is to view the sequencing read alignment on the Integrative Genome Browser (IGV; Thorvaldsdottir et al., 2013). Nevertheless, IGV is not optimized for large number of samples (cells) and large number of splicing events typical of single-cell analysis.

             Here, we introduce VALERIE (Visulazing ALternative splicing Events from RIbonucleic acid sequencing Experiments) - a visualisation platform to address the challenges in visualising alternative splicing events at single-cell resolution. Key features of VALERIE include:
(1) Displays PSI instead of conventional coverage/expression.
(2) Ability to scale to large datasets consisting of hundreds or even thousands of samples typical of single cell experiments.
(3) Summarizes PSI profile for user-defined groups of single cells.
(4) Assess statistical significance of PSI profiles between groups of single cells.
(5) Omits non-informative intronic regions (with the exception of retained-intron events).
(6) Standardizes genomic coordinates from 5’ to 3’ transcription direction.

Installation

Install package from Github
library(devtools)
install_github("wenweixiong/VALERIE")
Load MARVEL package and other packages required from this tutorial
library(VALERIE)

Dataset

             In the examples that follow, we will use published single induced pluripotent stem cells (iPSC) and endoderm cells. Single cells were prepared using Smart-seq2 and sequenced on HiSeq2000 on 125-bp paired-end (PE) mode (Linker et al., 2018). We have picked one representative differentially spliced event from each splicing category from our previous analysis here: https://wenweixiong.github.io/MARVEL

             These splicing event categories are namely skipped-exon (SE), mutually exclusive exons (MXE), retained-intron (RI), alternative 5’ splice site (A5SS), and alternative 5’ splice site (A3SS).

             The data used for this tutorial can be downloaded from here: http://userweb.molbiol.ox.ac.uk/public/wwen/VALERIE_Tutorial_Data.zip

Splicing nomenclature

             Practising accurate description of splicing nomenclature ensures reproducible analysis. Splicing nomenclature is simply the splicing event’s unique identifier (ID). In this tutorial, the splicing event ID should be fed to the tran_id argument. For detailed information of splicing nomenclature, please visit https://wenweixiong.github.io/Splicing_Nomenclature.

Plot PSI

             The function to plot PSI across the different cell populations is PlotPSI. Please launch the help page to get more information on each of the function’s argument.
?PlotPSI

Only two input files are required: | (1) Bam. This folder contains the BAM files and their corresponding index files. | (2) BamPheno. This is the sample metadata. Mandatory columns are bam.file.name and cell.type. Character strings under bam.file.nameshould match those in the Bam folder.

Skipped-exon (SE)

# Read sample metadata
path_to_file <- "Data/BAM_PhenoData.txt"
BamPheno <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(BamPheno)
##                                  bam.file.name  sample.id cell.type sample.type
## 1     ERR1562083.Aligned.sortedByCoord.out.bam ERR1562083   Unknown Single Cell
## 2 ERR1562083.Aligned.sortedByCoord.out.bam.bai ERR1562083   Unknown Single Cell
## 3     ERR1562084.Aligned.sortedByCoord.out.bam ERR1562084      iPSC Single Cell
## 4 ERR1562084.Aligned.sortedByCoord.out.bam.bai ERR1562084      iPSC Single Cell
## 5     ERR1562085.Aligned.sortedByCoord.out.bam ERR1562085      iPSC Single Cell
## 6 ERR1562085.Aligned.sortedByCoord.out.bam.bai ERR1562085      iPSC Single Cell
##   qc.seq
## 1   pass
## 2   pass
## 3   pass
## 4   pass
## 5   pass
## 6   pass
# Plot
PlotPSI(
  tran_id="chr15:24962114:24962209:+@chr15:24967029:24967152:+@chr15:24967932:24968082",
  event.type="SE",
  strand="positive",
  Bam="Data/BAM/",
  BamPheno=BamPheno,
  cell.types=c("iPSC", "Endoderm"),
  min.coverage=10,
  cons.exon.cutoff=100,
  method="ks",
  method.adj="bonferroni",
  cell.types.colors="ggplot.default",
  plot.title="SNRPN",
  plot.width=5,
  plot.height=8,
  plot.out="Data/Plots/SNRPN.pdf"
  )
## [1] "Reading in BAM files..."
## [1] "Computing PSI..."
## [1] "Plotting..."
include_graphics("Data/Plots/SNRPN.pdf")

Mutually excl. exons (MXE)

# Read sample metadata
path_to_file <- "Data/BAM_PhenoData.txt"
BamPheno <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE)

# Plot
PlotPSI(
  tran_id="chr9:35685269:35685339:-@chr9:35685064:35685139:-@chr9:35684732:35684807:-@chr9:35684488:35684550",
  event.type="MXE",
  strand="negative",
  Bam="Data/BAM/",
  BamPheno=BamPheno,
  cell.types=c("iPSC", "Endoderm"),
  min.coverage=10,
  cons.exon.cutoff=100,
  method="ks",
  method.adj="bonferroni",
  cell.types.colors="ggplot.default",
  plot.title="TPM2",
  plot.width=5,
  plot.height=8,
  plot.out="Data/Plots/TPM2.pdf"
  )
## [1] "Reading in BAM files..."
## [1] "Computing PSI..."
## [1] "Plotting..."
include_graphics("Data/Plots/TPM2.pdf")

Retained-intron (RI)

# Read sample metadata
path_to_file <- "Data/BAM_PhenoData.txt"
BamPheno <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE)

# Plot
PlotPSI(
  tran_id="chr8:98045550:98045508:-@chr8:98045399:98045347",
  event.type="RI",
  strand="negative",
  Bam="Data/BAM/",
  BamPheno=BamPheno,
  cell.types=c("iPSC", "Endoderm"),
  min.coverage=10,
  cons.exon.cutoff=100,
  method="ks",
  method.adj="bonferroni",
  cell.types.colors="ggplot.default",
  plot.title="RPL30",
  plot.width=5,
  plot.height=8,
  plot.out="Data/Plots/RPL30.pdf"
  )
## [1] "Reading in BAM files..."
## [1] "Computing PSI..."
## [1] "Plotting..."
include_graphics("Data/Plots/RPL30.pdf")

Alt. 5’ splice site (A5SS)

# Read sample metadata
path_to_file <- "Data/BAM_PhenoData.txt"
BamPheno <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE)

# Plot
PlotPSI(
  tran_id="chr8:144792587:144792245|144792366:-@chr8:144791992:144792140",
  event.type="A5SS",
  strand="negative",
  Bam="Data/BAM/",
  BamPheno=BamPheno,
  cell.types=c("iPSC", "Endoderm"),
  min.coverage=10,
  cons.exon.cutoff=100,
  method="ks",
  method.adj="bonferroni",
  cell.types.colors="ggplot.default",
  plot.title="RPL8",
  plot.width=5,
  plot.height=8,
  plot.out="Data/Plots/RPL8.pdf"
  )
## [1] "Reading in BAM files..."
## [1] "Computing PSI..."
## [1] "Plotting..."
include_graphics("Data/Plots/RPL8.pdf")

Alt. 3’ splice site (A3SS)

# Read sample metadata
path_to_file <- "Data/BAM_PhenoData.txt"
BamPheno <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE)

# Plot
PlotPSI(
  tran_id="chr1:171512032:171512200:+@chr1:171512995|171513001:171513172",
  event.type="A3SS",
  strand="positive",
  Bam="Data/BAM/",
  BamPheno=BamPheno,
  cell.types=c("iPSC", "Endoderm"),
  min.coverage=10,
  cons.exon.cutoff=25,
  method="ks",
  method.adj="bonferroni",
  cell.types.colors="ggplot.default",
  plot.title="PRRC2C",
  plot.width=5,
  plot.height=8,
  plot.out="Data/Plots/PRRC2C.pdf"
  )
## [1] "Reading in BAM files..."
## [1] "Computing PSI..."
## [1] "Plotting..."
include_graphics("Data/Plots/PRRC2C.pdf")

Revealing hidden features

Based on the plots above, we can observe several interesting PSI patterns.

MXE inverse relationship

Mutually exclusive exons are two exons located next to one another, but only one exon is spliced-in while the other exon is spliced-out. In the MXE plot, notice that the 5’ alternative exon is down-regulated, but the 3’ alternative exon is up-regulated in endoderm cells relative to iPSCs. This inverse relationship between the 5’ and 3’ alternative exons is a hallmark of MXE splicing event. We have also previously shown this MXE splicing pattern for PKM gene when iPSCs differentiate into neural progenitor cells or motor neuron cells (Wen et al., 2020).

Cyptic splice site

Typically for exon-level analysis, only one splicing event type is studied at any one time. Complex splicing event is when there is two or more splicing event occurring on the same exon or on a set of consecutive exons. In the RI plot, notice that there is a hint of alternative 3’ splice site on the 3’ constitutive exon. This is support by the change in PSI profile and splicing pattern at that region.

Jagged ends

Coverage is typically uneven at the start and end of the cDNA molecules, i.e. the first and last exons. In the A5SS plot, notice that the start of the 5’ constitutive exon has uneven coverage across the single cells. The grey color indicates coverage lower than that specified by the user in the min.coverage argument, i.e. low or no coverage. This phenomena has also been observed by long-read sequencing technologies (Legnini et a;., 2019).

5’/3’-bias

5’- and 3’-sequencing biases occurs when there is decreasing coverage from one end of the molecule towards the other end of the molecule. This is more pronounced in platforms such as 10x Genomics, but to much lesser extent in samples prepared by full-length library preparation methods such as Smart-seq2. In the A5SS plot, notice that the PSI values of the alternative exon gradually increases from the 5’ to 3’ direction. Perhaps this pattern is prominent here because this splicing event is located far away from the 3’ end.

Publication

Our paper on VALERIE has been published in PLoS Computational Biology journal (Wen et al., 2020).

References

    Legnini, I., Alles, J., Karaiskos, N., Ayoub, S., & Rajewsky, N. (2019). FLAM-seq: full-length mRNA sequencing reveals principles of poly(A) tail length control. Nat Methods, 16(9), 879-886. doi:10.1038/s41592-019-0503-y
    Linker, S. M., Urban, L., Clark, S. J., Chhatriwala, M., Amatya, S., McCarthy, D. J., . . . Bonder, M. J. (2019). Combined single-cell profiling of expression and DNA methylation reveals splicing regulation and heterogeneity. Genome Biol, 20(1), 30. doi:10.1186/s13059-019-1644-0
    Thorvaldsdottir, H., Robinson, J. T., & Mesirov, J. P. (2013). Integrative Genomics Viewer (IGV): high-performance genomics data visualisation and exploration. Brief Bioinform, 14(2), 178-192. doi:10.1093/bib/bbs017
    Wen, W. X., Mead, A. J., & Thongjuea, S. (2020a). Technological advances and computational approaches for alternative splicing analysis in single cells. Comput Struct Biotechnol J, 18, 332-343. doi:10.1016/j.csbj.2020.01.009
    Wen, W. X., Mead, A. J., & Thongjuea, S. (2020b). VALERIE: Visual-based inspection of alternative splicing events at single-cell resolution. PLoS Comput Biol, 16(9), e1008195. doi:10.1371/journal.pcbi.1008195