
SpaMTP: Spatial Multi-Omics Analysis
Source:vignettes/Multi-Omic_Mouse_Brain.Rmd
Multi-Omic_Mouse_Brain.Rmd
Multi-Omics Mouse Brain Vignette
This tutorial highlights the highlights the utility of
SpaMTP on multi-omic data. This vignette will use mouse brain
data, where both Spatial Transcriptomics (Visium), ST, and Spatial
Metabolomics (MALDI-MSI), SM, were generated from the same tissue
section. This data is publicly available from Vicari et al.
Using SpaMTP we will highlight:
- Loading SM Data
- Manually Aligning SM and ST datasets
- Mapping Aligned SM data to ST Spot Level
- Multi-Modal Data Integration of SM and ST Data
- Multi-Omic Data Visualisation
- Co-localisation of Metabolites and Genes
- Multi-Modality Pathway Analysis
Author: Andrew Causer
Load Data
Here, we will load in both our Visium and MALDI-MSI data! The Visium data is contained in 10X Genomic’s standard format. The MALDI-MSI data is in a matrix format, where one table contains both x and y coordinates, and also intensity values for each m/z value.
Install and Import R Libraries
First, we need to import the required libraries for this analysis:
## Install SpaMTP if not previously installed
if (!require("SpaMTP"))
devtools::install_github("GenomicsMachineLearning/SpaMTP")
#General Libraries
library(SpaMTP)
library(Cardinal)
library(Seurat)
library(dplyr)
#For plotting + DE plots
library(ggplot2)
library(EnhancedVolcano)
## For reproducability
set.seed(888)
Load SM Data
As mentioned above, the format of our SM data is in a mtx format. This is different to .ibd and .imzML files that were used in the Mouse Bladder Vignette. The layout of this dataset is demonstrated below:
x | y | mz_1 | mz_2 | mz_3 | … | mz_n |
---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 11 | … | 0 |
0 | 2 | 0 | 0 | 0 | … | 0 |
0 | 3 | 15 | 0 | 0 | … | 32 |
0 | 4 | 20 | 0 | 0 | … | 32 |
0 | 5 | 0 | 20 | 0 | … | 0 |
Based on this, we can read in the .csv
file that
contains our intensity and pixel coordinate values.
msi <- ReadSM_mtx("vignette_data_files/Mouse_Brain/SM_data/V11L12-109_B1.Visium.FMP.220826_smamsi.csv")
Spliting matrix data from column 1 onwards ....
Adding Pixel Metadata ....
Creating Centroids for Spatial Seurat Object ....
Great! Now lets explore our sample. First, we will see the layout of the SpaMTP Seurat Object:
msi
## An object of class Seurat
## 1538 features across 4914 samples within 1 assay
## Active assay: Spatial (1538 features, 0 variable features)
## 1 layer present: counts
## 1 spatial field of view present: fov
We can see that we have about 1,538 different m/z masses across 4,914 pixels.
Lets look at what we have in our metadata!
head(msi@meta.data)
orig.ident | nCount_Spatial | nFeature_Spatial | x_coord | y_coord | |
---|---|---|---|---|---|
0_0 | 0 | 0 | 0 | 0 | 0 |
0_1 | 0 | 0 | 0 | 0 | 1 |
0_2 | 0 | 0 | 0 | 0 | 2 |
0_3 | 0 | 0 | 0 | 0 | 3 |
0_4 | 0 | 0 | 0 | 0 | 4 |
0_5 | 0 | 0 | 0 | 0 | 5 |
We can see that our pixel names are a combination of x and y coordinates. We also have two data columns, nCount_Spatial and nFeature_Spatial, were were generated when loading the data. Respectively, these represent the total intensity across all m/z per pixel, and the number of different m/z values which have an intensity > 0 per pixel.
Lets have a look at the sample spatially. We will just visualise the number of m/z’s per pixel.
ImageFeaturePlot(msi, "nFeature_Spatial", size = 2, dark.background = FALSE)
We have some pixels which do not contain any SM data, thus we will remove them first before starting any analysis.
msi <- subset(msi, subset = nCount_Spatial > 0)
ImageFeaturePlot(msi, "nCount_Spatial", size = 1.3, dark.background = FALSE) | ImageFeaturePlot(msi, "nFeature_Spatial", size = 1.3, dark.background = FALSE)
We will also change our orig.ident
to sample name for
later reference:
msi@meta.data$orig.ident <- "msi_sample1"
Load ST Data
Now that we have successfully loaded our SM data, lets load in the ST
Visium data. This data can be loaded using Seurat’s
Load10X_Spatial()
function.
vis <- Load10X_Spatial("vignette_data_files/Mouse_Brain/ST_data/outs/")
vis
## An object of class Seurat
## 32285 features across 3120 samples within 1 assay
## Active assay: Spatial (32285 features, 0 variable features)
## 1 layer present: counts
## 1 spatial field of view present: slice1
The ST data has 32,285 genes and 3,120 spots. Next, we will visualise the sample and compare to the MSI image above.
SpatialFeaturePlot(vis, features = c("nCount_Spatial", "nFeature_Spatial"), pt.size.factor = 2)
In addition, we can load in the metadata associated with the tissue morphology generated by the authors of the original publication. As mentioned previously, the data we are analysing is from a Parkinson’s mouse model, where the dopamine neurons of one brain hemisphere has been lesioned. The metadata loaded below contains information of surrounding tissue structures and the lesion site.
Now, lets load this metadata and add it to our Visium Seurat object.
## Read in metadata files
lesion <- read.csv("vignette_data_files/Mouse_Brain/ST_data/outs/lesion.csv", row.names = 1)
region <- read.csv("vignette_data_files/Mouse_Brain/ST_data/outs/region.csv", row.names = 1)
region_loupe <- read.csv("vignette_data_files/Mouse_Brain/ST_data/outs/RegionLoupe.csv", row.names = 1)
## Combine into one dataframe
annotations <- cbind(lesion,region,region_loupe)
## Add data to metadata slot
vis@meta.data[colnames(annotations)] <- annotations[colnames(annotations)]
vis@meta.data <- vis@meta.data %>% mutate(annotations = ifelse(RegionLoupe %in% c("CP", "ACB"), paste0(RegionLoupe, "_", lesion), RegionLoupe))
head(vis)
orig.ident | nCount_Spatial | nFeature_Spatial | lesion | region | RegionLoupe | annotations | |
---|---|---|---|---|---|---|---|
AAACAAGTATCTCCCA-1 | SeuratProject | 3491 | 1562 | lesioned | striatum | CP | CP_lesioned |
AAACAGCTTTCAGAAG-1 | SeuratProject | 1454 | 626 | intact | not_striatum | ||
AAACATTTCCCGGATT-1 | SeuratProject | 5784 | 2275 | lesioned | striatum | CP | CP_lesioned |
AAACCCGAACGAAATC-1 | SeuratProject | 2048 | 1193 | lesioned | not_striatum | CTX | CTX |
Now, we can plot the results. We can use this information later for tissue alignment and removing poor quality spots sitting outside the tissue section.
SpatialDimPlot(vis, group.by = c("lesion","region", "RegionLoupe"))
Data preprocessing
Annotate m/z masses
Now that we have successfully loaded our ST and SM data, we can now run some general pre-processing to remove any poor quality spots/pixel and genes/metabolites.
In this case, the matrix run was an FMP10 matrix, meaning some of the
masses of our metabolites have an additional mass added to their weight,
and thus require a specialised annotation database to ensure the
annotations correct for this additional mass. SpaMTP has an inbuilt
function for annotating metabolites from FMP10 matrices called
AddFMP10Annotations()
. This function also allows for the
addition of custom/additional metabolites that aren’t annotated in the
current dataset.
Parkinson’s is heavily associated with dopamine regulation, and thus
we will also use the add.custom.annotations
parameter to
include the annotation of this molecule.
First, we will need to load this additional data.frame containing the custom annotations:
fmp10_metabolites <- read.csv("vignette_data_files/Mouse_Brain/SM_data/Public_data_FMP10_Annotations.csv", row.names = 1)
fmp10_metabolites
Observed.MZ | Reference_mz | IsomerNames | Adduct | Formula | Matched.delta.mass..Da. | Error | Exactmass | mass | annotation | Isomers | Isomers_IDs |
---|---|---|---|---|---|---|---|---|---|---|---|
435.21 | 435.21 | 3-Methoxytyramine (3-MT) | [M+FMP10] | C9H13NO2 | 0 | 0 | 435.21 | 435.21 | 3-Methoxytyramine (3-MT) | HMDB0000022 | hmdb:HMDB0000022 |
421.19 | 421.19 | Dopamine (single derivatized) | [M+FMP10] | C8H11NO2 | 0 | 0 | 421.19 | 421.19 | Dopamine (single derivatized) | HMDB0000073 | hmdb:HMDB0000073 |
674.28 | 674.28 | Dopamine (double derivatized) | [M+2FMP10a] | C8H11NO2 | 0 | 0 | 674.28 | 674.28 | Dopamine (double derivatized) | HMDB0000073 | hmdb:HMDB0000073 |
690.28 | 690.28 | Norephinephrine (NE) (double derivatized) | [M+2FMP10a] | C8H11NO3 | 0 | 0 | 690.28 | 690.28 | Norephinephrine (NE) (double derivatized) | HMDB0000216 | hmdb:HMDB0000216 |
Next, we can add these custom annotations to the FMP10 database and annotate our dataset:
msi_annotated <- AddFMP10Annotations(msi, add.custom.annotation = fmp10_metabolites, assay = "Spatial", return.only.annotated = F)
If we had a specific list of annotations, regardless of the matrix
used to generate the data, we can add these custom annotations using the
AddCustomMZAnnotations()
function. This function only
requires a data.frame containing two columns named mass
and
annotation
. We can see below how this differs from using
the inbuilt FMP10 database:
custom_panel <- fmp10_metabolites[c("mass", "annotation")]
custom_panel
mass | annotation |
---|---|
435.21 | 3-Methoxytyramine (3-MT) |
421.19 | Dopamine (single derivatized) |
674.28 | Dopamine (double derivatized) |
690.28 | Norephinephrine (NE) (double derivatized) |
msi_custom_annotated <- AddCustomMZAnnotations(msi, custom_panel)
The functions above add these annotations into the SpaMTP object’s feature meta.data slot. We can see and this below:
msi_annotated[["Spatial"]]@meta.data[413:414,]
raw_mz | mz_names | all_IsomerNames | Error | Reference_mz | true_mzs | all_Isomers | all_Isomers_IDs | all_Adducts | all_Formulas | all_Errors | |
---|---|---|---|---|---|---|---|---|---|---|---|
413 | 421.1914 | mz-421.19136 | Dopamine (single derivatized) | NA | NA | 421.19136 | HMDB0000073 | hmdb:HMDB0000073 | [M+FMP10] | C8H11NO2 | 0.00135999999997694 |
414 | 422.1391 | mz-422.13913 | Gentisic acid; 2-Pyrocatechuic acid; Protocatechuic acid; 2,6-Dihydroxybenzoic acid; 3,5-Dihydroxybenzoic acid; 2,4-Dihydroxybenzoic acid; 2,4,6-trihydroxybenzaldehyde; 3,4,5-Trihydroxybenzaldehyde | -1.385026; -1.385042 | 422.1381 | 422.13913 | HMDB0000152; HMDB0000397; HMDB0001856; HMDB0013676; HMDB0013677; HMDB0029666; HMDB0125090; HMDB0246043 | hmdb:HMDB0000152; hmdb:HMDB0000397; hmdb:HMDB0001856; hmdb:HMDB0013676; hmdb:HMDB0013677; hmdb:HMDB0029666; hmdb:HMDB0125090; hmdb:HMDB0246043 | [M+FMP10] | C7H6O4 | 0.0010300000000143 |
msi_custom_annotated[["Spatial"]]@meta.data[413:414,]
raw_mz | mz_names | mass | all_IsomerNames | true_mzs | ppm_diff | |
---|---|---|---|---|---|---|
413 | 421.1914 | mz-421.19136 | 421.19 | Dopamine (single derivatized) | 421.19136 | 0.00135999999997694 |
414 | 422.1391 | mz-422.13913 | No Annotation | No Annotation | No Annotation | No Annotation |
We can see that based on the FMP10-tagged value, we are able to annotated some of our m/z values with their respective metabolites.
Dopamine plays a critical part in Parkinson’s disease, so lets visualise this annotation spatially!
ImageMZAnnotationPlot(msi_annotated, "Dopamine (double derivatized)", slot = "counts", dark.background = FALSE, size = 1.5)
There is a strong localisation of dopamine in the non-lesioned hemisphere.
Align Multi-Omic Technologies
Although we have two -omic datasets from the same sample, the coordinates are not aligned in their current format. We can see this in the first plot below.
df1 <- GetTissueCoordinates(vis)
df1$sample <- "RNA"
df2 <- GetTissueCoordinates(msi_annotated)
df2[,"sample"] <- "MSI"
p1 <- ggplot(df1, aes(x, y,color = sample)) + theme_classic() + geom_point(size = 0.5) + scale_color_manual(values = "blue")
p2 <- ggplot(df2, aes(x, y,color = sample)) + theme_classic() + geom_point(size = 0.5) + scale_color_manual(values = "red")
p1|p2
To achieve multi-omic alignment, we must perform 2 steps:
1. Align Coordinates:
- Here, we will manually adjust the coordinates of the SM data to match the H&E image of the Visium data using a Shiny app generated through ManualAlignImages()
- We can change what data is plotted/visualised to assist with correct alignment
- This will output a SM SpaMTP Seurat Class Object with transformed coordinates (transformed via an affine transformation)
2. Map Omic Data to Common Spots:
- The final step is to map or bin the SM data to each ST spot using MapSpatialOmics()
- This step works by finding all SM data points within a radius of a ST spot and assigning their respective data
- Increasing the SM resolution in this function can improve the mapping by reducing the distance between pixel and spot centroids.
Lets run these functions on our sample:
Running this code below will activate an interactive shiny app where users can align two datasets. This example can also be visualised here.
msi_transformed <- AlignSpatialOmics(sm.data = msi_annotated, st.data = vis)
Loading required package: shiny
Show in New Window
Listening on http://0.0.0.0:4698
Sample: 2
1.21 4.587997e-18 73.5047
4.587997e-18 -1.21 580.6509
0 0 1
Lets observe the change to the coordinates, comparing between the
original x and y coordinates stored in our meta.data and our updated
pixel coordinates stored in the @image$fov
slot:
head(msi_transformed)
orig.ident | nCount_Spatial | nFeature_Spatial | x_coord | y_coord | |
---|---|---|---|---|---|
0_55 | msi_sample1 | 216796272 | 468 | 0 | 55 |
0_56 | msi_sample1 | 211210064 | 496 | 0 | 56 |
0_57 | msi_sample1 | 208775613 | 522 | 0 | 57 |
1_52 | msi_sample1 | 222418420 | 512 | 1 | 52 |
head(GetTissueCoordinates(msi_transformed))
x | y | cell |
---|---|---|
5895.680 | 14550.70 | 0_55 |
5895.680 | 13965.14 | 0_56 |
5895.680 | 13379.58 | 0_57 |
6481.237 | 16291.33 | 1_52 |
6481.237 | 15713.79 | 1_53 |
6481.237 | 15128.23 | 1_54 |
We can see that the values have been up-scaled to matched with the visium coordinates.
CheckAlignment(msi_transformed, vis, image.res = NULL, size = 0.8)
We can also check that our SM coordinates haven’t been distorted by plotting a m/z value. Let’s compare the before vs after.
ImageMZPlot(msi_annotated, mzs = 674.28, size = 2, slot = "counts") |
ImageMZPlot(msi_transformed, mzs = 674.28, size = 2, slot = "counts")
Great, now that we are confident that our SM and ST data are aligned to the same coordinates, we can map each pixel and their relative m/z intensity values to their corresponding ST spot.
NOTE: we have selected an overlap threshold of 20%, meaning that for a pixel to be mapped to a spot atleast 20% of its total area must be overlapping with the corresponding visium spot.
combined.data <- MapSpatialOmics(SM.data = msi_transformed, ST.data = vis, overlap.threshold = 0.3, annotations = TRUE, ST.scale.factor = NULL)
Running `MapSpatialOmics` in lowres mode! This is normally used for bin/spot-based spatial data (Visium)...
Inputs of `ST.scale.factor`, `overlap.threshold` and `merge.unique.metadata` are required for lowres mode, please ensure they are included ...
No pixel width was provided, calculating meadian SM pixel width ...
SM pixel width used for mapping = 569.514635563575...
Generating polygon from SM pixel coordinates ...
Generating polygon from ST spot coordinates ...
Assigning SM polygons to overlapping ST polygons ...
Generating new Spatial Multi-Omic SpaMTP Seurat Object ...
Averaging SM intensity values per ST spot ...
Adding SM metadata to the new SpaMTP Seurat Object ...
Adding metabolite annotation metadata to the new SpaMTP Seurat Object ...
Lets visualise how the mapping went and compare it to our original SM data object:
ImageMZPlot(msi_transformed, mzs = 674.28, size = 2, slot = "counts") |
SpatialMZPlot(combined.data, mzs = 674.28, assay = "SPM", slot = "counts", pt.size.factor = 2.2)
We can see that the expression of Dopamine (mz-674.28) displays the same pattern spatially in our Visium mapped object, compared to the original MSI-only data object.
Our data is almost ready for downstream analyses. However, based on the provided annotations displayed above, we should remove all spots that are not annotated as an anatomical region (meaning they are located outside of the tissue).
combined.data <- subset(combined.data, subset = RegionLoupe == "", invert = T)
SpatialDimPlot(combined.data, group.by = c("lesion", "RegionLoupe"), pt.size.factor = 2)
We can now see that we have removed spots that were associated with the outer border of the tissue. For example, the bottom portion of this section has a fold in the H&E and thus is technical noise, meaning these spots need to be removed.
Clustering/Downstream Analysis
Now that our spatial datasets are cleaned and fully aligned, we can perform some downstream analyses to determine cell types, differentially expressed genes and metabolites, and also upregulated metabolic pathways.
Note: Hidden below are the colour palettes used to plot.Colour Palettes
intergrated_palette <- list("0" = "#9dce71",
"1" = "#fcb992",
"2" = "#348bc2",
"3" = "#0067a8" ,
"4" = "#b3dff2",
"5" = "#fcf283",
"6" = "#c29fcb",
"7" = "#5ca6bd",
"8" = "#009648",
"9" = "#f37985",
"10" = "#592e8b",
"11" = "#c4c4c4")
SM_palette <- list("0" = "#9dce71",
"1" = "#009648",
"2" = "#348bc2",
"3" = "#b3dff2",
"4" = "#c29fcb")
ST_palette <- list("0" = "#348bc2",
"1" = "#009648",
"2" = "#f37985",
"3" = "#fcf283",
"4" = "#0067a8" ,
"5" = "#9dce71",
"6" = "#c29fcb",
"7" = "#592e8b",
"8" = "#e81d25",
"9" = "#ff59ee",
"10" = "#ad9ea0")
annotation_palette <- list("ACB_intact" = "#faa957",
"ACB_lesioned" = "#d16d08",
"aco" = "#592e8b",
"cc" = "#c29fcb",
"CP_intact" = "#b3dff2" ,
"CP_lesioned" = "#348bc2",
"CTX" = "#aedea0",
"LSX" = "#f37985",
"MSC" = "#e34045",
"OT" = "#ad9ea0",
"VL" = "#ff59ee")
Cluster ST data
First, we will run clustering on the ST data. Based on the two different modalities, mRNA data will perform much better at defining different cell types/structures within the tissue sample. We expect the clustering to be able to detect cell types, such as the striatum.
DefaultAssay(combined.data) <- "SPT"
## Run ST clustering
combined.data <- NormalizeData(combined.data, verbose = FALSE)
combined.data <- FindVariableFeatures(combined.data, verbose = FALSE)
combined.data <- ScaleData(combined.data, verbose = FALSE)
combined.data <- RunPCA(combined.data, npcs = 30, reduction.name = "spt.pca", verbose = FALSE)
combined.data <- FindNeighbors(combined.data, dims = 1:30, reduction = "spt.pca", verbose = FALSE)
combined.data <- RunUMAP(combined.data, dims = 1:30, reduction = "spt.pca", reduction.name = "spt.umap", verbose = FALSE)
combined.data <- FindClusters(combined.data, resolution = 0.5, cluster.name = "ST_clusters", verbose = FALSE)
DimPlot(combined.data, group.by = "ST_clusters", reduction = "spt.umap", cols = ST_palette) |
SpatialDimPlot(combined.data, group.by = "ST_clusters", pt.size.factor = 2.1, cols = ST_palette)
The clustering results demonstrate clear groups of cells. Lets compare them to the provided annotations:
striatum <- subset(combined.data, subset = region == "striatum")
SpatialDimPlot(striatum, group.by = "ST_clusters", label = F, pt.size.factor = 2.4, cols = ST_palette) |
SpatialDimPlot(striatum, group.by = "annotations", label = F, pt.size.factor = 2.4, cols = annotation_palette)
Looking at these results, we can see that our clustering using only genetic information can determine clear morphological brain regions within the Striatum, including the caudoputamen (CP) and nucleus accumbens (ACB). However, clustering using genetic information cannot identify lesioned regions of the Striatum, where the Striatum cells of left hemisphere produce significantly higher levels of Dopamine.
Idents(combined.data) <- "ST_clusters"
DE <- FindAllMarkers(combined.data, only.pos = T)
## prints top 5 genes for cluster 0 == Striatum
head(DE[DE$cluster == 0,] , n =5 )
p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | cluster | gene | |
---|---|---|---|---|---|---|---|
Ppp1r1b | 0 | 2.608029 | 0.958 | 0.342 | 0 | 0 | Ppp1r1b |
Gpr88 | 0 | 2.781403 | 0.928 | 0.285 | 0 | 0 | Gpr88 |
Rgs9 | 0 | 2.762431 | 0.884 | 0.232 | 0 | 0 | Rgs9 |
Pde1b | 0 | 2.363405 | 0.969 | 0.440 | 0 | 0 | Pde1b |
Penk | 0 | 2.519103 | 0.981 | 0.389 | 0 | 0 | Penk |
Looking at the top 5 most over-expressed genes in cluster 0, we can see some key genes that define the Striatum. These are plot below:
SpatialFeaturePlot(combined.data, features = c("Ppp1r1b", "Gpr88", "Rgs9"), pt.size.factor = 2)
Identified in the original publication, the gene, ‘Pcp4’, was highlighted as differentially expressed between the in-tact vs the lesioned Striatum. Because the gene expression alone cannot differentiate between the lesion vs non-lesioned cells this gene is not observed in the top most differentially expressed genes. We will need to look at the metabolic data to identify this gene.
Cluster SM data
Clustering based on metabolic information will likely reveal different information compared to a transcriptomic approach. SM clustering will reveal more about the activity or functioning of different cell types. Lets see how the clustering differs:
DefaultAssay(combined.data) <- "SPM"
## Run SM clustering
combined.data <- NormalizeData(combined.data, assay = "SPM")
combined.data <- FindVariableFeatures(combined.data, verbose = FALSE)
combined.data <- ScaleData(combined.data, verbose = FALSE)
combined.data <- RunPCA(combined.data, npcs = 30, reduction.name = "spm.pca", verbose = FALSE)
combined.data <- FindNeighbors(combined.data, dims = 1:15, reduction = "spm.pca", verbose = FALSE)
combined.data <- RunUMAP(combined.data, dims = 1:15, reduction = "spm.pca", reduction.name = "spm.umap", verbose = FALSE)
combined.data <- FindClusters(combined.data, resolution = 0.5, cluster.name = "SM_clusters", verbose = FALSE)
DimPlot(combined.data, group.by = "SM_clusters", reduction = "spm.umap", cols = SM_palette) |
SpatialDimPlot(combined.data, group.by = "SM_clusters", pt.size.factor = 2.1, cols = SM_palette) |
SpatialDimPlot(combined.data, group.by = "lesion", pt.size.factor = 2.1)
Looking at the clustering results above we can clearly see (cluster 4) defines the cells of the intact Striatum which still produce dopamine. In addition, the spatial distribution of cluster 4 matches the expression pattern of dopamine displayed below:
dopamine_mzs <- SearchAnnotations(combined.data, "Dopamine", assay = "SPM")
combined.data <- BinMetabolites(combined.data, mzs = dopamine_mzs$mz_names, slot = "counts", assay = "SPM", bin_name = "Dopamine")
SpatialFeaturePlot(combined.data, features = "Dopamine", pt.size.factor = 2.1) + theme(legend.position = "right")
Multi-Modal Integrative Analysis
As explained above, transcriptomic and metabolic data reveal and define different aspects of the tissue. Based on this, multi-modal integration can be used to combine these features into the one analysis. Below we will integrate the spatial transcriptomic and metabolomic data together to get a more refined and detailed clustering result.
The SpaMTP MultiOmicIntegration function allows for automated or manual specification of weights for each modality when generating the integrated clusters.
integrated.data <- MultiOmicIntegration(combined.data,return.intermediate = T, weight.list = list(0.4,0.6),
reduction.list = list("spt.pca", "spm.pca"), dims.list = list(1:30, 1:15))
integrated.data <- RunUMAP(integrated.data, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_",verbose = FALSE)
integrated.data <- FindClusters(integrated.data, graph.name = "wsnn", algorithm = 3, resolution = 1.5, cluster.name = "integrated_clusters", verbose = FALSE)
DimPlot(integrated.data, group.by = "integrated_clusters", reduction = 'wnn.umap', label = F, cols = intergrated_palette, pt.size = 1.5) |
SpatialDimPlot(integrated.data, group.by = "integrated_clusters", label = F, pt.size.factor = 2.2, cols = intergrated_palette, image.alpha = 1)
Looking at the results above, we can see the integrative analysis identifies a difference between the lesioned and in-tact striatum cells (cluster 2, cluster 4 and cluster 7 respectively)
We can also compare the difference between ST only, SM only and integrated clustering:
(SpatialDimPlot(integrated.data, group.by = "annotations", pt.size.factor = 2.5, cols = annotation_palette, image.alpha = 0.8)|
SpatialDimPlot(integrated.data, group.by = "SM_clusters", pt.size.factor = 2.5, cols = SM_palette, image.alpha = 0.8))/
(SpatialDimPlot(integrated.data, group.by = "ST_clusters", pt.size.factor = 2.5, cols = ST_palette, image.alpha = 0.8)|
SpatialDimPlot(integrated.data, group.by = "integrated_clusters", label = F, pt.size.factor = 2.5, cols = intergrated_palette, image.alpha = 0.8))
Differential Metabolite and Gene Expression Analysis
Based on this, we can look at which metabolites and genes are differentially expressed between these clusters. First, we will identify which clusters display an up-regulation of dopamine:
# Performs pooling, pseudo-bulking and EdgeR Differential Expression Analysis
cluster_DEMs <- FindAllDEMs(data = integrated.data, ident = "integrated_clusters", n = 3, logFC_threshold = 1.2,
DE_output_dir =NULL, return.individual = FALSE, assay = "SPM",
run_name = "FindAllDEMs", annotation.column = "all_IsomerNames" ) # DE results are returned in a data.frame
Pooling one sample into 3 replicates...
Running edgeR DE Analysis for FindAllDEMs -> with samples [ 3, 1, 2, 4, 7, 5, 6, 9, 0, 8, 10 ]
Starting condition: 3
Starting condition: 1
Starting condition: 2
Starting condition: 4
Starting condition: 7
Starting condition: 5
Starting condition: 6
Starting condition: 9
Starting condition: 0
Starting condition: 8
Starting condition: 10
results <- cluster_DEMs$DEMs[cluster_DEMs$DEMs$cluster == "4",]
metabolites <- c("Dopamine (double derivatized)", "Dopamine (single derivatized)")
### Sets up colouring for significant spots in volcano plot
keyvals <- ifelse(
results$annotations %in% metabolites, "orange", ifelse(
results$logFC < -1.2 & results$PValue < 50e-4, 'royalblue',
ifelse(results$logFC > 1.2 & results$PValue < 50e-4, 'red',
'black')))
keyvals[is.na(keyvals)] <- 'black'
names(keyvals)[keyvals == 'red'] <- 'Cluster 4 - UP'
names(keyvals)[keyvals == 'black'] <- 'Non-Sig'
names(keyvals)[keyvals == 'orange'] <- 'Dopamine'
names(keyvals)[keyvals == "royalblue"] <- "Cluster 4 - Down"
### Plots volcano plot with DEM results
volc_plot <- EnhancedVolcano::EnhancedVolcano( results,
selectLab = c(""),
lab = results$annotations,
colCustom = keyvals,
pCutoff = 50e-4,
FCcutoff = 1.2,
pointSize = 3,
colAlpha = 1,
x = 'logFC',
y = 'PValue',
gridlines.major = FALSE,
gridlines.minor = FALSE)
volc_plot
Analysing the DE results, we can see that dopamine (both the single and double FMP10-derivatized) is over expressed in cluster 4.
Now we will look at the gene expression differences between cluster 4 vs all other clusters within the Striatum:
## subset data to only contain the striatum
Idents(integrated.data) <- "integrated_clusters"
striatum <- subset(integrated.data, subset = region == "striatum")
## Find Differentially Expressed Genes
DefaultAssay(striatum) <- "SPT"
striatum_DEGs <- FindAllMarkers(striatum, only.pos = T)
head(striatum_DEGs[striatum_DEGs$cluster == "4",], n = 4)
p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | cluster | gene | |
---|---|---|---|---|---|---|---|
Ppp3ca | 0 | 0.4377823 | 1.000 | 0.996 | 0 | 4 | Ppp3ca |
Pcp4 | 0 | 0.6677092 | 0.996 | 0.969 | 0 | 4 | Pcp4 |
Hpca | 0 | 0.5292629 | 0.996 | 0.952 | 0 | 4 | Hpca |
Nrgn | 0 | 0.6513072 | 0.996 | 0.943 | 0 | 4 | Nrgn |
Supporting the findings from the orginal publication, we can see that “Pcp4” is up-regulated in cluster 4 (in-tact striatum cells).
SpatialFeaturePlot(integrated.data, features = "Pcp4", pt.size.factor = 2)
Multi-Omic Data Visualisation
Based on this analysis, we can visualise this gene and metabolite of interest in the same 3D plot (“Pcp4” and “Dopamine”).
Plot3DFeature(combined.data,
features = c("Pcp4", "mz-674.2805"),
assays = c("SPT", "SPM"),
between.layer.height = 10000,
show.image = "slice1",
plot.height = 400,
plot.width = 700,
downscale.image = 2)
Multi-Modality Pathway Analysis
Now that we have identified there are metabolic and transcriptional differences between the lesioned and intact striatum regions, we can now further assess the activity of each region by performing multi-modal pathway analysis using both genes and metabolites. Another novel feature of SpaMTP is the curated FMP10 annotation database which allows users to run pathway analysis with FMP10 data. Below we will identify pathways based on both genes and metabolites that are significantly altered between striatum regions.
# Generate differential abundance and expression results for integrated clustering
Idents(striatum) <- "integrated_clusters"
deg = FindAllMarkers(striatum, assay = "SPT",only.pos =F)
dem = FindAllMarkers(striatum , assay = "SPM", only.pos =F)
DE.list = list("genes" = deg,"metabolites" = dem)
## Run multi-modal pathway analysis
regpathway = FindRegionalPathways(
striatum,
analyte_types = c("genes", "metabolites"),
SM_slot = "counts", ST_slot = "counts",
ident = "integrated_clusters",
DE.list = DE.list
)
## Select some key pathways to plot
selected_pathways <- c("Epinephrine Action Pathway",
"GABA receptor activation",
"GABA receptor signaling",
"GABAergic synapse",
"Neuroinflammation and glutamatergic signaling",
"Nicotine effect on dopaminergic neurons",
"G alpha (i) signalling events",
"Neuronal System",
"Transmission across Chemical Synapses",
"VEGFA-VEGFR2 signaling",
"Apoptosis")
PlotRegionalPathways(regpathway = regpathway,selected_pathways = selected_pathways)
## computing jaccard distance between pathways
Due to the nature of the FMP10 matrix, there are often only a handful of metabolites annotated meaning some metabolites within various pathwasy are not captured. To resolve this we will run multi-modality pathway analysis using a paired Visium and MALDI DHB-matrix dataset. This data has already been processed with the code hidden below:
DHB Data Processing
### Data was generated and mapped following same steps used above.
### Raw files and metadata for DHB dataset and Visium can be downloaded here. The sample used was V11L12-038_B1: https://figshare.scilifelab.se/articles/dataset/Spatial_Multimodal_analysis_SMA_-_Mass_Spectrometry_Imaging_MSI_/22770161
### Dataset alignment was achieved using the following parameters
###
### Rotation Angle: -0.7
### Move X Axis: 59
### Move Y Axis: -102
### Blue Axis Angle: -45
### Blue Axis Stretch: 1.22
### Red Axis Angle: -45
### Red Axis Stretch: 1.12
### Mirror Y Axis: TRUE
## Using Aligned and Mapped data:
# data(combined.dhb.data)
## Annotate metabolites against databases
#db = rbind(HMDB_db, Chebi_db)
#combined.dhb.data = AnnotateSM(combined.dhb.data,assay = "SPM", db = db, polarity = "positive")
## Normalise ST and SM data
#combined.dhb.data <- NormalizeData(combined.dhb.data, assay = "SPT")
#combined.dhb.data <- NormalizeSMData(combined.dhb.data, assay = "SPM")
## Cluster SM data
#DefaultAssay(combined.dhb.data) <- "SPM"
#combined.dhb.data <- FindVariableFeatures(combined.dhb.data, verbose = FALSE)
# combined.dhb.data <- ScaleData(combined.dhb.data, verbose = FALSE)
# combined.dhb.data <- RunPCA(combined.dhb.data, npcs = 30, reduction.name = "spm.pca", verbose = FALSE)
# combined.dhb.data <- FindNeighbors(combined.dhb.data, dims = 1:30, reduction = "spm.pca", verbose = FALSE)
# combined.dhb.data <- RunUMAP(combined.dhb.data, dims = 1:30, reduction = "spm.pca", reduction.name = "spm.umap", verbose = FALSE)
# combined.dhb.data <- FindClusters(combined.dhb.data, resolution = 0.5, cluster.name = "SM_clusters", verbose = FALSE)
## Isolate the Lesioned vs Intact Striatums
#combined.dhb.data@meta.data <- combined.dhb.data@meta.data %>%
# dplyr::mutate(striatum =
# ifelse((SM_clusters %in% c("0")),
# lesion, "Other"))
## Subset to only lesioned and intact striatum
#striatum.dhb.data <- subset(combined.dhb.data, subset = striatum != "Other")
This data object contains the intact and lesioned striatum regions we will be comparing for our multi-modal DHB matrix sample.
striatum.dhb.data <- readRDS("vignette_data_files/Mouse_Brain/DHB_data/striatum.dhb.data.RDS")
SpatialDimPlot(striatum.dhb.data, group.by = "striatum",pt.size.factor = 2)
Lets run DE and then find regional pathways
FindRegionalPathways()
on our combined data object.
## Run DE analysis for transcriptomics and metabolomics
Idents(striatum.dhb.data) <- "striatum"
deg = FindAllMarkers(striatum.dhb.data , test.use = "wilcox_limma", assay = "SPT",only.pos =F)
dem = FindAllMarkers(striatum.dhb.data , assay = "SPM", only.pos =F)
DE.list = list("genes" = deg,"metabolites" = dem)
## Run multi-modal pathway analysis
regpathway = FindRegionalPathways(
striatum.dhb.data,
analyte_types = c("genes", "metabolites"),
SM_slot = "counts", ST_slot = "counts",
ident = "striatum",
DE.list = DE.list
)
We will select some important pathways associated with Parkinson’s disease to plot
selected_pathways = c(
"Dopamine beta-hydroxylase deficiency",
"Tryptophan metabolism",
"Dopamine metabolism",
"sphingolipid metabolism pathway",
"Sphingolipid metabolism: integrated pathway",
"Spermidine and Spermine Biosynthesis",
"Pyruvate metabolism",
"Fatty acid metabolism"
)
PlotRegionalPathways(regpathway = regpathway,selected_pathways = selected_pathways)
One of the significant pathways associated with the lesioned region was ‘Dopamine beta-hydroxylase deficiency’. We can use SpaMTP’s network plotting function to visualuse the network of this pathway, along with the relative differentially expressed elements that make up the activity of this pathway. Lets have a look:
PathwayNetworkPlots(striatum.dhb.data,ident = "striatum", regpathway = regpathway, DE.list = list("genes" = deg, "metabolites" = dem), selected_pathways = selected_pathways, path = "vignette_data_files/Mouse_Brain/DHB_data")
Assuming DE.list[1] contains genes results ....
Assuming DE.list[2] contains metabolites results ....
Query necessary data and establish pathway database
Query db for addtional matching
Constructing DE dataframes....
File successfully written to vignette_data_files/Mouse_Brain/DHB_data/striatum_2024_10_22_12_23_05_AEST.html
One key metabolite that was previously not mentioned by the original publication was (R)-S-adenosyl-L-methionine zwitterion. This metabolite has previously been associated with Parkinson’s disease and using SpaMTP, we were able to detect its expression within the lesioned striatum. We will now visualise the expression of this metabolite spatially:
SpatialMZPlot(striatum.dhb.data, mz= 400.14659,plusminus = 0.05, assay = "SPM", pt.size.factor = 2.5, slot = "data") + theme(legend.position = "right")
We can see that there is much more expression of this metabolite within the lesioned hemisphere.
Finally, we can combine the expression of each of these metabolites to see the exact region where this pathway is active.
## Get key mz values from network plot
mzs <- c(215.015, 321.08627, 400.14659)
mzs <- unlist(lapply(mzs, function(x){ FindNearestMZ(striatum.dhb.data, target_mz = x, assay = "SPM")}))
striatum.dhb.data <- BinMetabolites(striatum.dhb.data, mzs, slot = "data", bin_name = "Dopamine_pathway", assay = "SPM")
SpatialFeaturePlot(striatum.dhb.data, features = "Dopamine_pathway", pt.size.factor = 2.5) & theme(legend.position = "right")
## R version 4.3.2 (2023-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Australia/Brisbane
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] EnhancedVolcano_1.20.0 ggrepel_0.9.6 ggplot2_3.5.1
## [4] dplyr_1.1.4 Seurat_5.1.0 SeuratObject_5.0.2
## [7] sp_2.1-4 Cardinal_3.4.3 S4Vectors_0.40.2
## [10] EBImage_4.44.0 BiocParallel_1.36.0 BiocGenerics_0.48.1
## [13] ProtGenerics_1.34.0 SpaMTP_1.0.0
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.5 matrixStats_1.5.0
## [3] spatstat.sparse_3.1-0 bitops_1.0-9
## [5] sf_1.0-19 httr_1.4.7
## [7] RColorBrewer_1.1-3 tools_4.3.2
## [9] sctransform_0.4.1 R6_2.5.1
## [11] lazyeval_0.2.2 uwot_0.2.2
## [13] withr_3.0.2 gridExtra_2.3
## [15] progressr_0.15.1 cli_3.6.3
## [17] Biobase_2.62.0 textshaping_0.4.1
## [19] spatstat.explore_3.3-4 fastDummies_1.7.4
## [21] biglm_0.9-3 shinyjs_2.1.0
## [23] labeling_0.4.3 sass_0.4.9
## [25] spatstat.data_3.1-4 proxy_0.4-27
## [27] ggridges_0.5.6 pbapply_1.7-2
## [29] pkgdown_2.1.1 systemfonts_1.1.0
## [31] svglite_2.1.3 scater_1.30.1
## [33] parallelly_1.41.0 CardinalIO_1.0.0
## [35] limma_3.58.1 rstudioapi_0.17.1
## [37] generics_0.1.3 crosstalk_1.2.1
## [39] ica_1.0-3 spatstat.random_3.3-2
## [41] Matrix_1.6-5 ggbeeswarm_0.7.2
## [43] abind_1.4-8 lifecycle_1.0.4
## [45] edgeR_4.0.16 yaml_2.3.10
## [47] SummarizedExperiment_1.32.0 SparseArray_1.2.4
## [49] Rtsne_0.17 grid_4.3.2
## [51] promises_1.3.2 crayon_1.5.3
## [53] miniUI_0.1.1.1 lattice_0.21-9
## [55] beachmat_2.18.1 cowplot_1.1.3
## [57] zeallot_0.1.0 pillar_1.10.1
## [59] knitr_1.49 fgsea_1.28.0
## [61] GenomicRanges_1.54.1 future.apply_1.11.3
## [63] codetools_0.2-19 fastmatch_1.1-6
## [65] leiden_0.4.3.1 glue_1.8.0
## [67] spatstat.univar_3.1-1 data.table_1.16.4
## [69] vctrs_0.6.5 png_0.1-8
## [71] spam_2.11-0 gtable_0.3.6
## [73] cachem_1.1.0 xfun_0.50
## [75] S4Arrays_1.2.1 mime_0.12
## [77] survival_3.5-7 SingleCellExperiment_1.24.0
## [79] signal_1.8-1 units_0.8-5
## [81] statmod_1.5.0 fitdistrplus_1.2-2
## [83] ROCR_1.0-11 nlme_3.1-163
## [85] bit64_4.6.0-1 matter_2.4.1
## [87] RcppAnnoy_0.0.22 GenomeInfoDb_1.38.8
## [89] bslib_0.8.0 irlba_2.3.5.1
## [91] vipor_0.4.7 KernSmooth_2.23-22
## [93] colorspace_2.1-1 DBI_1.2.3
## [95] tidyselect_1.2.1 bit_4.5.0.1
## [97] compiler_4.3.2 ontologyIndex_2.12
## [99] BiocNeighbors_1.20.2 hdf5r_1.3.11
## [101] xml2_1.3.6 ggdendro_0.2.0
## [103] desc_1.4.3 DelayedArray_0.28.0
## [105] plotly_4.10.4 scales_1.3.0
## [107] classInt_0.4-11 lmtest_0.9-40
## [109] tiff_0.1-12 stringr_1.5.1
## [111] digest_0.6.37 goftest_1.2-3
## [113] fftwtools_0.9-11 spatstat.utils_3.1-2
## [115] rmarkdown_2.29 XVector_0.42.0
## [117] htmltools_0.5.8.1 pkgconfig_2.0.3
## [119] jpeg_0.1-10 sparseMatrixStats_1.14.0
## [121] MatrixGenerics_1.14.0 fastmap_1.2.0
## [123] rlang_1.1.4 htmlwidgets_1.6.4
## [125] shiny_1.10.0 DelayedMatrixStats_1.24.0
## [127] farver_2.1.2 jquerylib_0.1.4
## [129] zoo_1.8-12 jsonlite_1.8.9
## [131] mclust_6.1.1 BiocSingular_1.18.0
## [133] RCurl_1.98-1.16 magrittr_2.0.3
## [135] kableExtra_1.4.0 scuttle_1.12.0
## [137] GenomeInfoDbData_1.2.11 dotCall64_1.2
## [139] patchwork_1.3.0 munsell_0.5.1
## [141] Rcpp_1.0.14 ggnewscale_0.5.0
## [143] viridis_0.6.5 reticulate_1.40.0
## [145] stringi_1.8.4 zlibbioc_1.48.2
## [147] MASS_7.3-60 plyr_1.8.9
## [149] parallel_4.3.2 listenv_0.9.1
## [151] deldir_2.0-4 splines_4.3.2
## [153] tensor_1.5 locfit_1.5-9.10
## [155] igraph_2.1.2 spatstat.geom_3.3-4
## [157] RcppHNSW_0.6.0 reshape2_1.4.4
## [159] ScaledMatrix_1.10.0 evaluate_1.0.3
## [161] httpuv_1.6.15 RANN_2.6.2
## [163] tidyr_1.3.1 purrr_1.0.2
## [165] polyclip_1.10-7 future_1.34.0
## [167] scattermore_1.2 rsvd_1.0.5
## [169] xtable_1.8-4 e1071_1.7-16
## [171] RSpectra_0.16-2 later_1.4.1
## [173] viridisLite_0.4.2 class_7.3-22
## [175] ragg_1.3.3 tibble_3.2.1
## [177] beeswarm_0.4.0 IRanges_2.36.0
## [179] cluster_2.1.4 globals_0.16.3