Input files
In this section, we will read in
the required input files for MARVEL. The formatting requirement for each
input file will be explained.
Preprocessing
Alignment of sequencing reads was
performed using cellranger v2.1.1. Example code for one sample
(SRR9008752) below:
cellranger count --id=SRR9008752 \
--fastqs=SRR9008752 \
--sample=SRR9008752 \
--transcriptome=refdata-cellranger-GRCh38-3.0.0 \
--expect-cells 10000
The resulting BAM files for each
sample were used as inputs for STARsolo 2.7.8a (Kaminow
et al.,
2021) to generate the gene and splice junction count matrices required
for downstream processing. Example code for one sample (SRR9008752)
below. Please note the cell barcode list specified in
--soloCBwhitelist
should match the 10x Genomics chemistry
used to generated the scRNAseq dataset (
https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist-).
STAR --runThreadN 16 \
--genomeDir refdata-cellranger-GRCh38-3.0.0 \
--soloType CB_UMI_Simple \
--readFilesIn SRR9008752_possorted_genome_bam.bam \
--readFilesCommand samtools view -F 0x100 \
--readFilesType SAM SE \
--soloInputSAMattrBarcodeSeq CR UR \
--soloInputSAMattrBarcodeQual CY UY \
--soloCBwhitelist 737K-august-2016.txt \
--soloFeatures Gene SJ
The raw gene count outputs
(matrix.mtx, features.tsv, barcodes.tsv) may be found in the
Solo.out/Gene/filtered/ folder whereas the raw splice junction counts
may be found in theSolo.out/SJ/raw/ folder. The raw gene counts outputs
will be used for the Normalised gene expression
and
Gene counts
sections below whereas the raw splice junction
counts will be used for the Splice junction counts
section.
Normalised gene expression
Gene expression normalisation and
batch correction etc. should be performed prior to MARVEL.
The outputs of STARsolo (matrix.mtx.gz, features.tsv.gz,
barcodes.tsv.gz) were used as inputs for SingCellaR (Wang
et
al., 2022). For each sample, only cells passing QC were retained
and gene expression normalisation was also performed. All 4 samples
(iPSCs and iPSC-induced cardiomyocytes at day-2, -4, and -10) were
subsequently integrated and the gene expression (sparse) matrix, cell
metadata, and gene metadata were saved as separated files. These files
were used as inputs for MARVEL in R.
More details on data processing using SingCellaR on STAR
Protocol (
https://star-protocols.cell.com/protocols/1530).
The matrix rows represent genes, the columns represent
cells, and the values represent normalised gene expression values that
have not yet been log2-transformed.
The cell metadata rows represent the cell IDs while the
columns represent the cell information. Compulsory column is
cell.id
, all other columns are optional. The cell IDs here
should match the columns of the gene expression matrix.
The gene metadata rows represent the abbreviated gene
names. Compulsory column is
gene_short_name
, all other
columns are optional. The gene names here should match the rows of the
gene expression matrix.
# Matrix
path <- "Data/Gene_SingCellaR/"
file <- "matrix_normalised.mtx"
df.gene.norm <- readMM(paste(path, file, sep=""))
df.gene.norm[df.gene.norm[,1] !=0,][1:5,1:5]
## 5 x 5 sparse Matrix of class "dgTMatrix"
##
## [1,] 2.1168501 . 0.9731413 0.4139587 0.9874593
## [2,] 0.7056167 . . . .
## [3,] 2.1168501 3.453436 1.9462826 3.3116695 1.9749185
## [4,] 1.4112334 . 0.4865707 0.4139587 1.4811889
## [5,] 4.2337003 2.302291 4.3791359 3.3116695 1.9749185
# cell metadata
path <- "Data/Gene_SingCellaR/"
file <- "phenoData.txt"
df.gene.norm.pheno <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.gene.norm.pheno)
## cell.id donor.id cell.type
## 1 AAACCTGAGCCGGTAA_SRR9008752 SRR9008752 Cardio day 4
## 2 AAACCTGAGTACCGGA_SRR9008752 SRR9008752 Cardio day 4
## 3 AAACCTGAGTGCTGCC_SRR9008752 SRR9008752 Cardio day 4
## 4 AAACCTGAGTGTACGG_SRR9008752 SRR9008752 Cardio day 4
## 5 AAACCTGAGTGTCCAT_SRR9008752 SRR9008752 Cardio day 4
## 6 AAACCTGCAGCCTGTG_SRR9008752 SRR9008752 Cardio day 4
# gene metadata
path <- "Data/Gene_SingCellaR/"
file <- "featureData.txt"
df.gene.norm.feature <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.gene.norm.feature)
## gene_short_name
## 1 MIR1302-2HG
## 2 FAM138A
## 3 OR4F5
## 4 AL627309.1
## 5 AL627309.3
## 6 AL627309.2
Gene counts
Raw gene counts here will be used
alongside raw splice junction counts (below) for splicing
analysis.
Each sample’s raw gene count files generated by STARsolo
should be collated and read into R as follows.
The matrix rows represent genes, the columns represent
cells, and the values represent UMI-collapsed, non-normalised,
non-log2-transformed gene counts.
The cell metadata rows represent the cell IDs while the
columns represent the cell information. Compulsory column is
cell.id
, all other columns are optional. The cell IDs here
should match the columns of the gene expression matrix. The cell IDs
should also match the cell metadata and matrix of the normalised gene
expression data df.gene.norm.pheno
above.
The gene metadata rows represent the abbreviated gene
names. Compulsory column is gene_short_name
, all other
columns are optional. The gene names here should match the rows of the
gene expression matrix. The gene names should also match the gene
metadata and matrix of the normalised gene expression data
df.gene.norm.feature
above.
# Matrix
path <- "Data/Gene_STARsolo/"
file <- "matrix_counts.mtx"
df.gene.count <- readMM(paste(path, file, sep=""))
df.gene.count[df.gene.count[,1] !=0,][1:5,1:5]
## 5 x 5 sparse Matrix of class "dgTMatrix"
##
## [1,] 1 2 1 3 2
## [2,] 1 2 . . .
## [3,] 1 1 . . .
## [4,] 5 6 8 16 8
## [5,] 3 12 9 10 13
# cell metadata
path <- "Data/Gene_STARsolo/"
file <- "phenoData.txt"
df.gene.count.pheno <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.gene.count.pheno)
## cell.id
## 1 AAACCTGAGAAGATTC_SRR9008754
## 2 AAACCTGAGAAGGGTA_SRR9008754
## 3 AAACCTGAGCACACAG_SRR9008754
## 4 AAACCTGAGGGCTTGA_SRR9008754
## 5 AAACCTGAGTTGTCGT_SRR9008754
## 6 AAACCTGCAAGTCTGT_SRR9008754
# feature metadata
path <- "Data/Gene_STARsolo/"
file <- "featureData.txt"
df.gene.count.feature <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.gene.count.feature)
## gene_short_name
## 1 MIR1302-2HG
## 2 FAM138A
## 3 OR4F5
## 4 AL627309.1
## 5 AL627309.3
## 6 AL627309.2
Splice junction counts
Raw splice junctions counts here
will be used alongside raw gene counts (above) for splicing
analysis.
Each sample’s raw splice junction count files generated by
STARsolo should be collated and read into R as follows.
The matrix rows represent splice junction coordinates, the
columns represent cells, and the values represent UMI-collapsed,
non-normalised, non-log2-transformed splice junction counts.
The cell metadata rows represent the cell IDs while the
columns represent the cell information. Compulsory column is
cell.id
, all other columns are optional. The cell IDs here
should match the columns of the splice junction count matrix. The cell
IDs should also match the cell metadata and matrix of the normalised
gene expression data df.gene.norm.pheno
above.
The gene metadata rows represent the splice junction
coordinates. Compulsory column is coord.intron
, all other
columns are optional. The splice junction coordinates here should match
the rows of the splice junction count matrix.
# Matrix
path <- "Data/SJ_STARsolo/"
file <- "matrix_counts.mtx"
df.sj.count <- readMM(paste(path, file, sep=""))
df.sj.count[df.sj.count[,1] !=0,][1:5,1:5]
## 5 x 5 sparse Matrix of class "dgTMatrix"
##
## [1,] 5 4 7 14 6
## [2,] 1 1 2 1 1
## [3,] 1 . . . .
## [4,] 1 1 . . .
## [5,] 1 . . . .
# cell metadata
path <- "Data/SJ_STARsolo/"
file <- "phenoData.txt"
df.sj.count.pheno <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.sj.count.pheno)
## cell.id
## 1 AAACCTGAGAAGATTC_SRR9008754
## 2 AAACCTGAGAAGGGTA_SRR9008754
## 3 AAACCTGAGCACACAG_SRR9008754
## 4 AAACCTGAGGGCTTGA_SRR9008754
## 5 AAACCTGAGTTGTCGT_SRR9008754
## 6 AAACCTGCAAGTCTGT_SRR9008754
# feature metadata
path <- "Data/SJ_STARsolo/"
file <- "featureData.txt"
df.sj.count.feature <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.sj.count.feature)
## coord.intron
## 1 chr1:14830:14969
## 2 chr1:14830:185490
## 3 chr1:15039:186316
## 4 chr1:16766:16853
## 5 chr1:16766:16857
## 6 chr1:16766:16875
tSNE/UMAP coordinates
Dimension reduction embeddings
would have to be performed prior to MARVEL. These embeddings will be
used to project gene expression and splice junction expression values by
MARVEL. Compulsory columns are cell.id
, x
(x-axis coordinates) and y
(y-axis coordinates). All other
columns are optional.
Here, the tSNE embeddings were generated by SingCellaR
(Wang et al., 2022).
path <- "Data/Gene_SingCellaR/"
file <- "dim_red_coordinates_iPSC_CardioDay10.txt"
df.coord <- read.table(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE)
head(df.coord)
## cell.id x y
## 1 AAACCTGAGAAGATTC_SRR9008754 15.06473 -12.685700
## 2 AAACCTGAGAAGGGTA_SRR9008754 23.00697 10.624021
## 3 AAACCTGAGCACACAG_SRR9008754 12.97456 -18.985449
## 4 AAACCTGAGCTTTGGT_SRR9008753 -20.56841 -1.531026
## 5 AAACCTGAGGGCTTGA_SRR9008754 36.47875 -6.242368
## 6 AAACCTGAGTCGTTTG_SRR9008753 -24.82610 -18.792021
Gene transfer file
For consistency, this GTF should be
the same file previously used for cellranger and STARsolo above.
path <- "Data/GTF/"
file <- "refdata-cellranger-GRCh38-3.0.0.gtf"
gtf <- as.data.frame(fread(paste(path, file, sep=""), sep="\t", header=FALSE, stringsAsFactors=FALSE))
Pre-process data
This step annotates the gene and
splice junction metadata, and subsequently enables us to retain only
high-quality splice junctions and genes of particular biotype, e.g.,
protein-coding genes.
Validate junctions
Only splice junctions whose start
and end mapped to same gene are retained.
marvel <- ValidateSJ.10x(MarvelObject=marvel)
Subset CDS genes
Moving forward, we will only retain
genes with coding sequence (CDS), i.e., protein-coding genes.
marvel <- FilterGenes.10x(MarvelObject=marvel,
gene.type="protein_coding"
)
Pre-flight check
This step ensures that our data is
ready for further downstream analysis including differential expression
analysis, functional annotation, and candidate gene analysis.
To this end, we will have to make sure the columns of the
matrices align with the cell IDs of the sample metadata and the rows of
the matrices align with the feature metadata. Finally, the columns
across all matrices should align with one another.
marvel <- CheckAlignment.10x(MarvelObject=marvel)
Our data is ready for downstream
analysis when only MATCHED
flags are reported. If any
NOT MATCHED
flags are reported, please double-check the
input file requirements.
Save R object
Given the time taken for data
pre-processing, it’ll be a good idea to save the MARVEL object at this
step. The object may be loaded whenever the user begins any one of the
downstream analysis.
# Save object
path <- "Data/R object/"
file <- "MARVEL.RData"
save(marvel, file=paste(path, file, sep=""))
# Load object
# path <- "Data/R Object/"
# file <- "MARVEL.RData"
# load(paste(path, file, sep=""))
Explore expression
Due to the huge number of splice
junctions detected, it will not be time-effective to include all splice
junctions for differential splicing analysis. Therefore, we will only
include splice junctions expressed in at least a user-defined proportion
of cells for differential splicing analysis. To determine this
threshold, we will explore the number of cells expressing a given gene
and splice junction.
We will explore the gene and splice junction expression
rate in iPSCs and day-10 cardiomyocytes as we will be performing
differential splicing analysis on these two cell populations.
Gene expression rate
# Define cell groups
# Retrieve sample metadata
sample.metadata <- marvel$sample.metadata
# Group 1 (reference)
index <- which(sample.metadata$cell.type=="iPSC")
cell.ids.1 <- sample.metadata[index, "cell.id"]
length(cell.ids.1)
## [1] 11244
# Group 2
index <- which(sample.metadata$cell.type=="Cardio day 10")
cell.ids.2 <- sample.metadata[index, "cell.id"]
length(cell.ids.2)
## [1] 5937
# Explore % of cells expressing genes
marvel <- PlotPctExprCells.Genes.10x(MarvelObject=marvel,
cell.group.g1=cell.ids.1,
cell.group.g2=cell.ids.2,
min.pct.cells=5
)
marvel$pct.cells.expr$Gene$Plot
head(marvel$pct.cells.expr$Gene$Data)
## cell.group gene_short_name n.cells.total n.cells.expr pct.cells.expr
## 2 cell.group.g1 NOC2L 11244 5590 49.72
## 5 cell.group.g1 HES4 11244 687 6.11
## 6 cell.group.g1 ISG15 11244 7172 63.79
## 7 cell.group.g1 AGRN 11244 2381 21.18
## 12 cell.group.g1 SDF4 11244 4440 39.49
## 13 cell.group.g1 C1QTNF12 11244 565 5.02
min.pct.cells
option. The minimum percentage of cells
expressing a given gene for that gene to be included for expression
exploration analysis here.
- Here, we observe majority of genes to be expressed in ~10% of
cells.
SJ expression rate
marvel <- PlotPctExprCells.SJ.10x(MarvelObject=marvel,
cell.group.g1=cell.ids.1,
cell.group.g2=cell.ids.2,
min.pct.cells.genes=5,
min.pct.cells.sj=5,
downsample=TRUE,
downsample.pct.sj=10
)
marvel$pct.cells.expr$SJ$Plot
head(marvel$pct.cells.expr$SJ$Data)
## cell.group coord.intron n.cells.total n.cells.expr pct.cells.expr
## 25 cell.group.g1 chr1:1629705:1630291 11244 1227 10.91
## 26 cell.group.g1 chr1:1634329:1634450 11244 832 7.40
## 32 cell.group.g1 chr1:1721712:1722707 11244 867 7.71
## 42 cell.group.g1 chr1:2400936:2402206 11244 1066 9.48
## 47 cell.group.g1 chr1:3775484:3775794 11244 1467 13.05
## 79 cell.group.g1 chr1:8868058:8870451 11244 674 5.99
min.pct.cells.genes
option. The minimum percentage of
cells expressing a given gene for that gene to be included for
expression exploration analysis here. This threshold may be inferred
from after running the PlotPctExprCells.Genes.10x
function
earlier.
min.pct.cells.sj
option. The minimum percentage of
cells expressing a given splice junction for that splice junction to be
included for expression exploration analysis.
downsample
option. Down-sample the number of splice
junctions prior to expression exploration analysis.
downsample.pct.sj
option. When downsample
set to TRUE
, the percentage of splice junctions to keep for
expression exploration analysis.
- Here, we observe majority of splice junctions to be expressed in
~10% of cells.
Differential analysis
Differential analysis is the
cornerstone of RNA-sequencing analysis. This is the first step to
identify candidate genes and isoforms for downstream experimental
validation.
For differential splicing analysis, MARVEL utilises the
permutation-based approach for comparing the PSI values between two
pseudo-bulk samples (Efremova et al., 2020).
For differential gene expression
analysis, MARVEL utilises the Wilcoxon rank-sum test to compare the
normalised, log2-transformed gene expression values between the
single-cells that constitute the two pseudo-bulk samples (Satija et
al., 2015)
Differential splicing analysis
marvel <- CompareValues.SJ.10x(MarvelObject=marvel,
cell.group.g1=cell.ids.1,
cell.group.g2=cell.ids.2,
min.pct.cells.genes=10,
min.pct.cells.sj=10,
min.gene.norm=1.0,
seed=1,
n.iterations=100,
downsample=TRUE,
show.progress=FALSE
)
cell.group.g1
option. Cell IDs corresponding to the
reference cell group.
cell.group.g2
option. Cell IDs corresponding to the
non-reference cell group.
min.pct.cells.genes
option. The minimum number of cells
expressing a given gene for the gene to be included for DE analysis.
This threshold may be determined from running the
PlotPctExprCells.Genes.10x
function earlier.
min.pct.cells.sj
option. The minimum number of cells
expressing a given splice junction for the splice junction to be
included for DE analysis. This threshold may be determined from running
the PlotPctExprCells.SJ.10x
function earlier.
seed
option. To ensure the null distributions and
dowm-sampling process are always reproducible.
n.iterations
option. The number of permutations to
perform when building the null distribution.
downsample
option. If set to TRUE
, the
number of cells of each cell group will be down-sampled. The number of
cells to down-sample to will be based on the size of smaller cell
group.
show.progress
option. For brevity of this tutorial, we
did not track the DE progress here. But users are advised to set this
option to TRUE
on their own devices.
Differential gene analysis
marvel <- CompareValues.Genes.10x(MarvelObject=marvel,
show.progress=FALSE
)
head(marvel$DE$SJ$Table)
## coord.intron gene_short_name n.cells.total.g1 n.cells.expr.sj.g1
## 1 chr1:1373903:1373999 AURKAIP1 5937 5858
## 2 chr1:1402257:1405808 MRPL20 5937 2955
## 3 chr1:1405887:1406908 MRPL20 5937 1069
## 4 chr1:2189782:2193638 FAAP20 5937 4686
## 5 chr1:6197757:6199561 RPL22 5937 4839
## 6 chr1:6820251:6825091 CAMTA1 5937 479
## pct.cells.expr.sj.g1 n.cells.expr.gene.g1 pct.cells.expr.gene.g1
## 1 98.67 5897 99.33
## 2 49.77 5845 98.45
## 3 18.01 5845 98.45
## 4 78.93 5479 92.29
## 5 81.51 5937 100.00
## 6 8.07 5753 96.90
## sj.count.total.g1 gene.count.total.g1 psi.g1 n.cells.total.g2
## 1 40782 49806 81.88 5937
## 2 4425 36002 12.29 5937
## 3 1199 36002 3.33 5937
## 4 10815 19491 55.49 5937
## 5 11697 193189 6.05 5937
## 6 498 27599 1.80 5937
## n.cells.expr.sj.g2 pct.cells.expr.sj.g2 n.cells.expr.gene.g2
## 1 5798 97.66 5858
## 2 3516 59.22 5859
## 3 1249 21.04 5859
## 4 5270 88.77 5715
## 5 5499 92.62 5937
## 6 675 11.37 5717
## pct.cells.expr.gene.g2 sj.count.total.g2 gene.count.total.g2 psi.g2
## 1 98.67 32201 38876 82.83
## 2 98.69 5754 35035 16.42
## 3 98.69 1445 35035 4.12
## 4 96.26 15551 25371 61.29
## 5 100.00 19280 189020 10.20
## 6 96.29 721 27367 2.63
## log2fc delta pval mean.expr.gene.norm.g1.g2 n.cells.total.norm.g1
## 1 0.01644263 0.95 0 2.138988 5937
## 2 0.39040352 4.13 0 1.901503 5937
## 3 0.24177679 0.79 0 1.901503 5937
## 4 0.14100507 5.80 0 1.410974 5937
## 5 0.66780357 4.15 0 4.074651 5937
## 6 0.37454272 0.83 0 1.631310 5937
## n.cells.expr.gene.norm.g1 pct.cells.expr.gene.norm.g1 mean.expr.gene.norm.g1
## 1 5897 99.33 2.501344
## 2 5845 98.45 2.108606
## 3 5845 98.45 2.108606
## 4 5479 92.29 1.436871
## 5 5937 100.00 4.350727
## 6 5753 96.90 1.821909
## n.cells.total.norm.g2 n.cells.expr.gene.norm.g2 pct.cells.expr.gene.norm.g2
## 1 5937 5858 98.67
## 2 5937 5859 98.69
## 3 5937 5859 98.69
## 4 5937 5715 96.26
## 5 5937 5937 100.00
## 6 5937 5717 96.29
## mean.expr.gene.norm.g2 log2fc.gene.norm pval.gene.norm pval.adj.gene.norm
## 1 1.776632 -0.72471240 0.000000e+00 0.000000e+00
## 2 1.694400 -0.41420606 0.000000e+00 0.000000e+00
## 3 1.694400 -0.41420606 0.000000e+00 0.000000e+00
## 4 1.385077 -0.05179485 8.407605e-13 9.278512e-13
## 5 3.798576 -0.55215116 0.000000e+00 0.000000e+00
## 6 1.440711 -0.38119821 8.205003e-297 1.623418e-296
show.progress
option. For brevity of this tutorial, we
did not track the DE progress here. But users are advised to set this
option to TRUE
on their own devices.
- Please note that MARVEL will only perform differential gene
expression analysis on genes included for differential splicing
analysis. These genes constitute the splice junctions included for
analysis using the
CompareValues.SJ.10x
function
earlier.
Volcano plot: Splicing
marvel <- PlotDEValues.SJ.10x(MarvelObject=marvel,
pval=0.05,
delta=5,
min.gene.norm=1.0,
anno=FALSE
)
marvel$DE$SJ$VolcanoPlot$SJ$Plot
head(marvel$DE$SJ$VolcanoPlot$SJ$Data[,c("coord.intron", "gene_short_name", "sig")])
## coord.intron gene_short_name sig
## 1 chr1:1373903:1373999 AURKAIP1 n.s.
## 2 chr1:1402257:1405808 MRPL20 n.s.
## 3 chr1:1405887:1406908 MRPL20 n.s.
## 4 chr1:2189782:2193638 FAAP20 up
## 5 chr1:6197757:6199561 RPL22 n.s.
## 6 chr1:6820251:6825091 CAMTA1 n.s.
pval
option. The p-value, below which, the splice
junction is considered to be differentially spliced.
delta
option. The absolute difference in mean PSI
values between the two cell populations, above which, the splice
junction is considered to be differentially spliced.
min.gene.norm
option. The mean normalised gene
expression value across all cells from both populations, above which,
the corresponding splice junction is considered to be expressed.
Gene-splicing dynamics
MARVEL’s integrated differential
gene and splicing analysis enables us to investigate how gene expression
changes relative to splice junction changes when iPSCs differentiate
into day-10 cardiomyocytes. The gene-splicing dynamics may be classified
into four categories, namely coordinated, opposing, isoform-switching,
and complex.
- Coordinated gene-splicing relationship refers to
the change in mean gene expression is in the same direction with the
corresponding splicing junction(s).
- Opposing gene-splicing relationship refers to the
change in mean gene expression is in the opposite direction to the
corresponding splicing junction(s).
- Isoform-switching refers to genes that are
differentially spliced without being differentially expressed.
- Complex gene-splicing relationship refers to genes
with both coordinated and opposing relationships with the corresponding
splicing junctions.
Here, we will explore the
gene-splicing dynamics of genes that are differentially spliced between
iPSCs and day-10 cardiomyocytes. Representative examples of each dynamic
will also be shown. This section also utilises the ad hoc plotting
functions PlotValues.PCA.CellGroup.10x
,
PlotValues.PCA.Gene.10x
,
PlotValues.PCA.PSI.10x
for plotting cell groups, PSI, and
gene expression, respectively, on the reduced dimension space.
Please not users are required to run the
CompareValues.SJ.10x
and
CompareValues.Genes.10x
functions (refer to “Differential
splicing analysis” section) before proceeding with this section.
Assign dynamics
marvel <- IsoSwitch.10x(MarvelObject=marvel,
pval.sj=0.05,
delta.sj=5,
min.gene.norm=1.0,
pval.adj.gene=0.05,
log2fc.gene=0.5
)
marvel$SJ.Gene.Cor$Proportion$Plot
marvel$SJ.Gene.Cor$Proportion$Table
## sj.gene.cor freq pct
## 2 Coordinated 101 18.738404
## 4 Opposing 83 15.398887
## 3 Iso-Switch 317 58.812616
## 1 Complex 38 7.050093
head(marvel$SJ.Gene.Cor$Data[,c("coord.intron", "gene_short_name", "cor.complete")])
## coord.intron gene_short_name cor.complete
## 1 chr1:2189782:2193638 FAAP20 Iso-Switch
## 2 chr1:11054886:11054961 SRM Opposing
## 3 chr1:11674817:11675081 MAD2L2 Opposing
## 4 chr1:20652529:20652620 DDOST Iso-Switch
## 5 chr1:22086867:22091427 CDC42 Coordinated
## 6 chr1:23313703:23318482 HNRNPR Coordinated
pval.sj
option. The p-value, below which, the splice
junction is considered to be differentially spliced.
delta.sj
option. The absolute difference in mean PSI
values between the two cell populations, above which, the splice
junction is considered to be differentially spliced.
min.gene.norm
option. The mean normalised gene
expression value across all cells from both populations, above which,
the corresponding splice junction is considered to be expressed.
pval.adj.gene
option. The p-value, below which, the
gene is considered to be differentially expressed.
log2fc.gene
option. The absolute log2 fold change in
gene expression, above which, the gene is considered to be
differentially expressed.
- Here, we observe majority of differentially spliced genes underwent
isoform-switching followed by coordinated, opposing, and then complex
gene expression changes relative to splicing changes.
- Note that majority of differentially spliced genes may not be
inferred directly from differential gene expression analysis alone. For
example, only in coordinated gene-splicing relationship that the gene
and splicing changes between iPSCs and day-10 cardiomyocytes are in the
same direction. But this category only constitute around one-fifth of
gene-splicing relationships.
Coordinated
# Define cell groups
# Retrieve sample metadata
sample.metadata <- marvel$sample.metadata
# iPSC
index <- which(sample.metadata$cell.type=="iPSC")
cell.ids.1 <- sample.metadata[index, "cell.id"]
length(cell.ids.1)
## [1] 11244
# Cardio day 10
index <- which(sample.metadata$cell.type=="Cardio day 10")
cell.ids.2 <- sample.metadata[index, "cell.id"]
length(cell.ids.2)
## [1] 5937
# Save into list
cell.group.list <- list("iPSC"=cell.ids.1,
"Cardio d10"=cell.ids.2
)
# Plot cell groups
marvel <- PlotValues.PCA.CellGroup.10x(MarvelObject=marvel,
cell.group.list=cell.group.list,
legendtitle="Cell group",
type="tsne"
)
plot_group <- marvel$adhocPlot$PCA$CellGroup
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="VIM",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr10:17235390:17235845",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_sj <- marvel$adhocPlot$PCA$PSI
# Arrange and view plots
grid.arrange(plot_group, plot_gene, plot_sj, nrow=1)
- For VIM, both gene expression (middle) and the
corresponding splice junction (right) shown here are up-regulated in
day-10 cardiomyocytes relative to iPSCs.
- Note gene expression represented by log2(expression) scale whereas
splicing rate represented by PSI scale.
Opposing
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="UQCRH",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr1:46310317:46316551",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_sj <- marvel$adhocPlot$PCA$PSI
# Arrange and view plots
grid.arrange(plot_group, plot_gene, plot_sj, nrow=1)
- For UQRCH, the gene expression (middle) is up-regulated in
iPSCs relative to day-10 cardiomyocytes.
- On the other hand, the corresponding splice junction (right) shown
here is up-regulated in day-10 cardiomyocytes relative to iPSCs.
- Note gene expression represented by log2(expression) scale whereas
splicing rate represented by PSI scale.
Iso-switch
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="RBM39",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI: SJ-1
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr20:35740888:35741940",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_sj_1 <- marvel$adhocPlot$PCA$PSI
# Plot PSI: SJ-2
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr20:35739018:35740823",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot_sj_2 <- marvel$adhocPlot$PCA$PSI
# Arrange and view plots
grid.arrange(plot_gene, plot_sj_1, plot_sj_2, nrow=1)
- For RBM39, the gene (left) is not differentially expressed
between iPSCs and day-10 cardiomyocytes.
- On the other hand, both corresponding splice junctions (middle and
right) are up-regulated in iPSCs relative to day-10 cardiomyocytes.
- Note gene expression represented by log2(expression) scale whereas
splicing rate represented by PSI scale.
Complex
# Gene 1
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="TPM1",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.1_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI: SJ-1
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr15:63064143:63065895",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.1_sj_1 <- marvel$adhocPlot$PCA$PSI
# Plot PSI: SJ-2
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr15:63044153:63056984",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.1_sj_2 <- marvel$adhocPlot$PCA$PSI
# Gene 2
# Plot gene expression
marvel <- PlotValues.PCA.Gene.10x(MarvelObject=marvel,
gene_short_name="TPM2",
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.2_gene <- marvel$adhocPlot$PCA$Gene
# Plot PSI: SJ-1
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr9:35684808:35685268",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.2_sj_1 <- marvel$adhocPlot$PCA$PSI
# Plot PSI: SJ-2
marvel <- PlotValues.PCA.PSI.10x(MarvelObject=marvel,
coord.intron="chr9:35684551:35685063",
min.gene.count=3,
log2.transform=FALSE,
color.gradient=c("grey","cyan","green","yellow","red"),
type="tsne"
)
plot.2_sj_2 <- marvel$adhocPlot$PCA$PSI
# Arrange and view plots
grid.arrange(plot.1_gene, plot.1_sj_1, plot.1_sj_2,
plot.2_gene, plot.2_sj_1, plot.2_sj_2,
nrow=2
)
- For both TPM1 and TPM2 genes, their gene
expressions (left) are up-regulated in day-10 cardiomyocytes relative to
iPSCs.
- Similar to gene expression profile, one of the splice junctions
(middle) for these genes are up-regulated in day-10 cardiomyocytes
relative to iPSCs.
- On the other hand, the other splice junction (right) for these genes
are down-regulated in day-10 cardiomyocytes relative to iPSCs.
- Note gene expression represented by log2(expression) scale whereas
splicing rate represented by PSI scale.
Gene ontology analysis
Gene ontology analysis or pathway
enrichment analysis categorises the differentially spliced genes between
iPSCs and day-10 cardiomyocytes into biological pathways. This may
identify sets of genes with similar function or belong to similar
biological pathways that are concurrently spliced.
marvel <- BioPathways.10x(MarvelObject=marvel,
pval=0.05,
delta=5,
min.gene.norm=1.0,
method.adjust="fdr",
species="human"
)
head(marvel$DE$BioPathways$Table)
## ID
## GO:0006119 GO:0006119
## GO:0046034 GO:0046034
## GO:0000377 GO:0000377
## GO:0000398 GO:0000398
## GO:1903311 GO:1903311
## GO:0010257 GO:0010257
## Description
## GO:0006119 oxidative phosphorylation
## GO:0046034 ATP metabolic process
## GO:0000377 RNA splicing, via transesterification reactions with bulged adenosine as nucleophile
## GO:0000398 mRNA splicing, via spliceosome
## GO:1903311 regulation of mRNA metabolic process
## GO:0010257 NADH dehydrogenase complex assembly
## GeneRatio BgRatio enrichment pvalue p.adjust
## GO:0006119 48/454 149/18866 13.386867 6.120887e-41 2.421423e-37
## GO:0046034 54/454 311/18866 7.215349 9.007387e-31 8.908306e-28
## GO:0000377 56/454 390/18866 5.966881 1.698617e-27 7.466364e-25
## GO:0000398 56/454 390/18866 5.966881 1.698617e-27 7.466364e-25
## GO:1903311 50/454 344/18866 6.039981 7.320281e-25 2.227618e-22
## GO:0010257 24/454 65/18866 15.343409 1.245443e-22 3.284649e-20
## qvalue
## GO:0006119 2.034712e-37
## GO:0046034 7.485613e-28
## GO:0000377 6.273955e-25
## GO:0000398 6.273955e-25
## GO:1903311 1.871858e-22
## GO:0010257 2.760077e-20
## geneID
## GO:0006119 UQCRH/NDUFS2/ATP5F1C/NDUFS3/NDUFS8/ATP5MC2/ATP5F1B/NDUFA12/DNAJC15/NDUFB1/COX5A/NDUFB10/UQCRC2/NDUFAB1/COX4I1/ATP5PD/NDUFV2/ATP5F1A/ATP5F1D/NDUFS7/NDUFA3/ATP5MC3/NDUFB3/ATP5F1E/ATP5PF/NDUFA6/UQCRC1/NDUFB4/ATP5ME/NDUFC1/NDUFS6/NDUFS4/COX7C/UQCRQ/COX7A2/NDUFA4/CYCS/ATP5MF/NDUFA5/NDUFB2/COX6C/NDUFB9/CYC1/NDUFB6/STOML2/NDUFA8/NDUFB11/NDUFA1
## GO:0046034 UQCRH/NDUFS2/TPR/PARP1/GUK1/ATP5F1C/NDUFS3/NDUFS8/GAPDH/ATP5MC2/ATP5F1B/NDUFA12/DNAJC15/NDUFB1/COX5A/NDUFB10/UQCRC2/NDUFAB1/COX4I1/ATP5PD/NDUFV2/ATP5F1A/ATP5F1D/NDUFS7/NDUFA3/OLA1/ATP5MC3/NDUFB3/ATP5F1E/ATP5PF/APP/NDUFA6/UQCRC1/NDUFB4/ATP5ME/NDUFC1/NDUFS6/NDUFS4/COX7C/UQCRQ/COX7A2/NDUFA4/CYCS/ATP5MF/NDUFA5/NDUFB2/COX6C/NDUFB9/CYC1/NDUFB6/STOML2/NDUFA8/NDUFB11/NDUFA1
## GO:0000377 HNRNPR/SRRM1/DNAJC8/SFPQ/THRAP3/YBX1/SRSF11/RBM8A/PRDX6/SNRPE/HNRNPU/RBM17/POLR2G/SF3B2/CWC15/ZCRB1/PCBP2/HNRNPA1/SNRPF/SAP18/HNRNPC/SRSF5/PAPOLA/SNRPA1/SNRNP25/SRRM2/FUS/DDX5/SNRPD1/CIRBP/HNRNPM/UBL5/SNRPD2/SF3B6/SRSF7/SNRPG/PRPF40A/SF3B1/SNRPB/SNRPB2/RALY/RBM39/SON/POLR2F/DDX17/LSM3/U2SURP/FXR1/TRA2B/LSM6/HNRNPH1/SNRPC/SYNCRIP/BUD31/POLR2J/PSIP1
## GO:0000398 HNRNPR/SRRM1/DNAJC8/SFPQ/THRAP3/YBX1/SRSF11/RBM8A/PRDX6/SNRPE/HNRNPU/RBM17/POLR2G/SF3B2/CWC15/ZCRB1/PCBP2/HNRNPA1/SNRPF/SAP18/HNRNPC/SRSF5/PAPOLA/SNRPA1/SNRNP25/SRRM2/FUS/DDX5/SNRPD1/CIRBP/HNRNPM/UBL5/SNRPD2/SF3B6/SRSF7/SNRPG/PRPF40A/SF3B1/SNRPB/SNRPB2/RALY/RBM39/SON/POLR2F/DDX17/LSM3/U2SURP/FXR1/TRA2B/LSM6/HNRNPH1/SNRPC/SYNCRIP/BUD31/POLR2J/PSIP1
## GO:1903311 HNRNPR/YTHDF2/PSMB2/THRAP3/YBX1/SERBP1/PSMA5/RBM8A/PSMD4/PSMB4/PRDX6/HNRNPU/RBM17/VIM/PSMC3/POLR2G/YBX3/HNRNPA1/SAP18/HNRNPC/PSME2/PSMA3/SRSF5/PAPOLA/SLTM/PSMA4/FUS/PSMB6/PSMB3/PSMC5/DDX5/CIRBP/HNRNPM/UBA52/PSMD8/SRSF7/RBM39/YWHAB/SON/DDX17/DHX36/FXR1/TRA2B/NPM1/SYNCRIP/PSMB1/HSPB1/YWHAZ/PSMB7/SET
## GO:0010257 NDUFS2/NDUFS3/NDUFS8/NDUFA12/NDUFB1/NDUFB10/NDUFAB1/NDUFV2/NDUFS7/NDUFA3/NDUFB3/NDUFA6/NDUFB4/NDUFC1/NDUFS6/NDUFS4/NDUFA5/NDUFB2/NDUFB9/DMAC1/NDUFB6/NDUFA8/NDUFB11/NDUFA1
## Count
## GO:0006119 48
## GO:0046034 54
## GO:0000377 56
## GO:0000398 56
## GO:1903311 50
## GO:0010257 24
pval
option. The p-value, below which, the splice
junction is considered to be differentially spliced.
delta
option. The absolute difference in mean PSI
values between the two cell populations, above which, the splice
junction is considered to be differentially spliced.
min.gene.norm
option. The mean normalised gene
expression value across all cells from both populations, above which,
the corresponding splice junction is considered to be expressed.
species
option. MARVEL also supports GO analysis of
"mouse"
.
- Alternative to using the
pval
, delta
, and
min.gene.norm
options for specifying differentially spliced
genes for GO analysis, users may input their own custom genes for GO
analysis using the custom.genes
option.
# Plot pathways related to cardiomyocyte development
go.terms <- c("ATP metabolic process",
"regulation of stem cell differentiation",
"Wnt signaling pathway, planar cell polarity pathway",
"non-canonical Wnt signaling pathway",
"muscle filament sliding",
"actin-myosin filament sliding",
"regulation of ATPase activity",
"regulation of fibroblast proliferation",
"actomyosin structure organization",
"myofibril assembly",
"negative regulation of calcium-mediated signaling"
)
marvel <- BioPathways.Plot.10x(MarvelObject=marvel,
go.terms,
y.label.size=8,
offset=0.5
)
marvel$DE$BioPathways$Plot
Candidate gene analysis
Users may have specific candidate
genes of interest to investigate in the single-cell RNA-sequencing
dataset. These candidate genes may have been identified from initial
differential gene expression analysis of the same dataset or from an
earlier discovery dataset, e.g. bulk RNA-sequencing dataset, or that
reported from the literature.
Here, we will illustrate several related functions for
investigating the gene-splicing relationship between a selected gene and
its corresponding splice junctions. This may reveal candidate isoforms
for downstream functional validation.
We have chosen TPM2 for this illustration as we
have seen earlier that this gene has a complex relationship with its
splice junctions (see “Gene-splicing dynamics” section of this
tutorial). Therefore, this gene will be a good example for demonstrating
the intricate relationship betweeh splice junction usage and gene
expression profile.
Define cell groups
Our earlier differential analysis
included only iPSCs and day-10 cardiomyocytes. Here, we will include all
cell types in this dataset, namely iPSCs, and day-2, -4, and -10
cardiomyocytes.
# Retrieve sample metadata
sample.metadata <- marvel$sample.metadata
# iPSC
index <- which(sample.metadata$cell.type=="iPSC")
cell.ids.1 <- sample.metadata[index, "cell.id"]
length(cell.ids.1)
## [1] 11244
# Cardio day 2
index <- which(sample.metadata$cell.type=="Cardio day 2")
cell.ids.2 <- sample.metadata[index, "cell.id"]
length(cell.ids.2)
## [1] 6240
# Cardio day 4
index <- which(sample.metadata$cell.type=="Cardio day 4")
cell.ids.3 <- sample.metadata[index, "cell.id"]
length(cell.ids.3)
## [1] 8635
# Cardio day 10
index <- which(sample.metadata$cell.type=="Cardio day 10")
cell.ids.4 <- sample.metadata[index, "cell.id"]
length(cell.ids.4)
## [1] 5937
# Save into list
cell.group.list <- list("iPSC"=cell.ids.1,
"day2"=cell.ids.2,
"day4"=cell.ids.3,
"day10"=cell.ids.4
)
Gene expression profiling
marvel <- adhocGene.TabulateExpression.Gene.10x(MarvelObject=marvel,
cell.group.list=cell.group.list,
gene_short_name="TPM2",
min.pct.cells=10,
downsample=TRUE
)
marvel$adhocGene$Expression$Gene$Plot
marvel$adhocGene$Expression$Gene$Table
## group mean.expr pct.cells.expr
## 1 iPSC 1.533693 93.39734
## 2 day2 2.064926 99.52838
## 3 day4 2.889787 99.89894
## 4 day10 2.919375 99.89894
min.pct.cells
option. The minimum percentage of cells
the gene is expressed in for the gene to be considered expressed and
included for analysis here.
downsample
option. If set to TRUE
, the
number of cells of each cell group will be down-sampled. The number of
cells to down-sample to will be based on the size of smallest cell
group.
- We observe progressive increased in TPM2 gene expression
from iPSCs to early cardiomyocytes and then late cardiomyocytes.
SJ usage profiling
marvel <- adhocGene.TabulateExpression.PSI.10x(MarvelObject=marvel,
min.pct.cells=10
)
marvel$adhocGene$Expression$PSI$Plot
head(marvel$adhocGene$Expression$PSI$Table)
## group figure.column coord.intron n.cells.total n.cells.expr.sj
## 1 iPSC SJ-1 chr9:35682164:35684245 5937 5209
## 2 iPSC SJ-2 chr9:35684316:35684487 5937 4860
## 3 iPSC SJ-3 chr9:35684551:35684731 5937 2754
## 4 iPSC SJ-4 chr9:35684808:35685268 5937 1013
## 5 iPSC SJ-5 chr9:35685340:35685433 5937 919
## 6 iPSC SJ-9 chr9:35684551:35685063 5937 1073
## pct.cells.expr.sj sj.count.total gene.count.total psi
## 1 87.74 15904 21778 73.03
## 2 81.86 12730 21778 58.45
## 3 46.39 4083 21778 18.75
## 4 17.06 1152 21778 5.29
## 5 15.48 1039 21778 4.77
## 6 18.07 1229 21778 5.64
min.pct.cells
option. The minimum percentage of cells
the splice junction is expressed in for the splice junction to be
considered expressed and included for analysis here.
- We observe 10 splice junctions to be expressed in at least one cell
population.
- SJ-1 is more highly expressed in iPSCs and early cardiomyocytes
relative to late cardiomyocytes.
- However, there is virtually no difference in SJ-2 expression across
all cell populations.
- On the other hand, SJ-3 is more highly expressed in cardiomyocytes
relative to iPSCs.
Gene-splicing relationship
Let’s investigate how the top 3
most highly-expressed splice junctions’ usage (SJ-1, -2, and -3) changes
relative to TPM2 gene expression changes for all possible
pair-wise comparison between iPSCs, and day-2, -4, and -10
cardiomyocytes.
# Perform differential gene expression analysis for all pair-wise group comparison
marvel <- adhocGene.DE.Gene.10x(MarvelObject=marvel)
# Perform differential splicing expression analysis for all pair-wise group comparison
marvel <- adhocGene.DE.PSI.10x(MarvelObject=marvel)
# Retrieve SJ DE results table
results <- marvel$adhocGene$DE$PSI$Data
# SJ-1
# Define SJ
coord.intron <- results[which(results$figure.column=="SJ-1"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35682164:35684245"
# Plot
marvel <- adhocGene.PlotDEValues.10x(MarvelObject=marvel,
coord.intron=coord.intron,
log2fc.gene=0.5,
delta.sj=5,
label.size=2,
point.size=2,
xmin=-2.0,
xmax=2.0,
ymin=-25,
ymax=25
)
plot.1 <- marvel$adhocGene$DE$VolcanoPlot$Plot
# SJ-2
# Define SJ
coord.intron <- results[which(results$figure.column=="SJ-2"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35684316:35684487"
# Plot
marvel <- adhocGene.PlotDEValues.10x(MarvelObject=marvel,
coord.intron=coord.intron,
log2fc.gene=0.5,
delta.sj=5,
label.size=2,
point.size=2,
xmin=-2.0,
xmax=2.0,
ymin=-25,
ymax=25
)
plot.2 <- marvel$adhocGene$DE$VolcanoPlot$Plot
# SJ-3
# Define SJ
coord.intron <- results[which(results$figure.column=="SJ-3"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35684551:35684731"
# Plot
marvel <- adhocGene.PlotDEValues.10x(MarvelObject=marvel,
coord.intron=coord.intron,
log2fc.gene=0.5,
delta.sj=5,
label.size=2,
point.size=2,
xmin=-2.0,
xmax=2.0,
ymin=-25,
ymax=25
)
plot.3 <- marvel$adhocGene$DE$VolcanoPlot$Plot
# Arrange and view plots
grid.arrange(plot.1, plot.2, plot.3,
nrow=3
)
log2fc.gene
option. The absolute log2 fold change,
above which, the gene is considered differentially expressed.
delta.sj
option. The absolute difference in mean PSI
values between two cell populations, above which, the splice junction is
considered differentially spliced.
- SJ-1 is more highly expressed in iPSCs and early cardiomyocytes
relative to late cardiomyocytes.
- However, there is virtually no difference in SJ-2 expression across
all cell populations.
- On the other hand, SJ-3 is more highly expressed in cardiomyocytes
relative to iPSCs.
Locate SJ position
Locating the position of the splice
junctions on the isoforms corresponding to TPM2 will reveal the
specific isoforms expressed in our dataset for this.
# SJ-1
# Define SJs to plot
coord.intron <- results[which(results$figure.column=="SJ-1"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35682164:35684245"
# Plot
marvel <- adhocGene.PlotSJPosition.10x(MarvelObject=marvel,
coord.intron=coord.intron,
rescale_introns=FALSE,
show.protein.coding.only=TRUE,
anno.label.size=1.5
)
marvel$adhocGene$SJPosition$Plot
# SJ-2
# Define SJs to plot
coord.intron <- results[which(results$figure.column=="SJ-2"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35684316:35684487"
# Plot
marvel <- adhocGene.PlotSJPosition.10x(MarvelObject=marvel,
coord.intron=coord.intron,
rescale_introns=FALSE,
show.protein.coding.only=TRUE,
anno.label.size=1.5
)
marvel$adhocGene$SJPosition$Plot
# SJ-3
# Define SJs to plot
coord.intron <- results[which(results$figure.column=="SJ-3"), "coord.intron"]
coord.intron <- unique(coord.intron)
coord.intron
## [1] "chr9:35684551:35684731"
# Plot
marvel <- adhocGene.PlotSJPosition.10x(MarvelObject=marvel,
coord.intron=coord.intron,
rescale_introns=FALSE,
show.protein.coding.only=TRUE,
anno.label.size=1.5
)
marvel$adhocGene$SJPosition$Plot
rescale_introns
option. If set to TRUE
,
the introns will be minimise in order to emphasise the exons and splice
junctions.
show.protein.coding.only
option. If set to
TRUE
, only protein-coding isoforms will be displayed.
- We may infer that ENST00000329305 is the dominant isoform expressed
in this dataset because it contains the 3 top highly expressed splice
junctions.
- Due to differential splice junction usage across the different cell
population, it is conceivable that isoforms other than ENST00000329305
are also expressed in this dataset, and are differentially expressed
across the different cell population.