Skip to contents

Handeling Large SM Datasets

SpaMTP is suitable for datasets of all sizes. However, extra large datasets may need special handling to speed up the analysis process. To demonstrate some helpful functions for processing large datasets we will use a public mouse liver dataset with spotted chemicals standards published here.

Author: Andrew Causer


Import R Libraries and Load Data

First we need to import the required libraries for this analysis.

## Install SpaMTP if not previously installed
if (!require("SpaMTP"))
    devtools::install_github("GenomicsMachineLearning/SpaMTP")

#General Libraries
library(SpaMTP)
library(Cardinal)
library(Seurat)
library(dplyr)

#For plotting + DE plots
library(ggplot2)
library(EnhancedVolcano)
library(viridis)

Next we will load the data, you can download or load it directly from the SpaMTP zenodo page.

spotted_large <- Cardinal::readImzML("./Spotted/2020-12-05_ME_X190_L1_Spotted_20umss_375x450_33at_DAN_Neg",resolution = 3, mass.range = c(100,1000), memory = T)
spotted_large
MSImagingExperiment with 767528 features and 168750 spectra 
spectraData(1): intensity
featureData(1): mz
pixelData(3): x, y, run
coord(2): x = 1...375, y = 1...450
runNames(1): 2020-12-05_ME_X190_L1_Spotted_20umss_375x450_33at_DAN_Neg
experimentData(8): spectrumType, instrumentModel, ionSource, ..., scanPattern, scanType, lineScanDirection
mass range: 100.0000 to 999.9959 
centroided: NA 

You can see our dataset has 767,529 features and 168,750 pixels. This is quite a large dataset that will require alot of memory to process. However, it is likely we don’t need to analyse all the features and pixels to generate meaningful biological conclusion.

Annotating Large Datasets

Using the function AnnotateBigData we can find m/z values that were successfully annotated and only perform the remainder of the downstream analyses using these.

## Get all the m/z values from our cardinal object
mzs <- data.frame(Cardinal::featureData(spotted_large))$mz

## Annotate each m/z value
results <- AnnotateBigData(mzs, db = HMDB_db, ppm_error = 3, adducts = c("M-H", "M+Cl"), polarity = "negative") 
dim(results)[1]
[1] 67060

We can now see we successfully annotated 67,060 different m/z values which will reduce our dataset size by up to ~11.5x.

Lets look at the annoated results:

head(results, n = 5)
observed_mz all_IsomerNames all_Isomers all_Isomers_IDs all_Adducts all_Formulas all_Errors mz_names present
100.0039 2,4-Oxazolidinedione; hydroxyoxazolone HMDB0245467; HMDB0253283 hmdb:HMDB0245467; hmdb:HMDB0253283 M-H C3H3NO3 1.1688 mz-100.003900076051 TRUE
100.0042 2,4-Oxazolidinedione; hydroxyoxazolone HMDB0245467; HMDB0253283 hmdb:HMDB0245467; hmdb:HMDB0253283 M-H C3H3NO3 1.8312 mz-100.004200088201 TRUE
100.0084 Cyclopentadienyl HMDB0250665 hmdb:HMDB0250665 M+Cl C5H5 1.2680 mz-100.00840035281 TRUE
100.0087 Cyclopentadienyl HMDB0250665 hmdb:HMDB0250665 M+Cl C5H5 1.7320 mz-100.008700378461 TRUE
100.0171 (S)-methylmalonate-semialdehyde; acetoacetate HMDB0304000; HMDB0304256 hmdb:HMDB0304000; hmdb:HMDB0304256 M-H C4H5O3 0.4013 mz-100.017101462133 TRUE

We can then use these m/z values to subset our results and then generate our SpaMTP Seurat object.

## Subset cardinal object
spotted_small <- Cardinal::subset(spotted_large, mz %in% results$observed_mz)

## Convert Cardinal object to SpaMTP object
spotted_small <- CardinalToSeurat(spotted_small)

Region of Interest Selection

Now we have our filtered SpaMTP data object lets plot it.

ImageFeaturePlot(spotted_small, features = "nCount_Spatial", dark.background = F)& scale_fill_gradientn(colors = viridis::viridis(100), limits = c(0, 400000), na.value = viridis::viridis(100)[100])

We can see that there are alot of pixels outside the tissue section that are clearly noise with high intensity values. We could remove these using filtering methods, but for the purpose of demonstrating the built-in ROI selection tool, we can also use SpaMTP to manually select the region we wish to analyses.

Lets run this below and see an example of how to use this:

spotted_small <- SelectROIs(spotted_small)


Here is an example of what the selection might look like:

head(spotted_small) %>% select(tail(names(.), 3))
x_coord y_coord ROI_1
1_1 1 1 0
2_1 2 1 0
3_1 3 1 0
4_1 4 1 0
5_1 5 1 0
6_1 6 1 0
7_1 7 1 0
8_1 8 1 0
9_1 9 1 0
10_1 10 1 0

Looking at the last 3 columns we can see our saved ROI selection area. Lets plot it visually:

ImageDimPlot(spotted_small, group.by = "ROI_1", dark.background = F)

Now we can simply subset our dataset:

spotted_small <- subset(spotted_small, subset = ROI_1 == "1")

ImageFeaturePlot(spotted_small, features = "nCount_Spatial", dark.background = F)& scale_fill_gradientn(colors = viridis::viridis(100), limits = c(0, 400000), na.value = viridis::viridis(100)[100])

Session Info

## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS 15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Dublin
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] viridis_0.6.5          viridisLite_0.4.2      EnhancedVolcano_1.24.0
##  [4] ggrepel_0.9.6          ggplot2_4.0.0          dplyr_1.1.4           
##  [7] Seurat_5.3.0           SeuratObject_5.2.0     sp_2.2-0              
## [10] Cardinal_3.8.3         S4Vectors_0.44.0       ProtGenerics_1.38.0   
## [13] BiocGenerics_0.52.0    BiocParallel_1.40.2    SpaMTP_1.0.0          
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     rstudioapi_0.17.1      jsonlite_2.0.0        
##   [4] magrittr_2.0.4         spatstat.utils_3.2-0   farver_2.1.2          
##   [7] rmarkdown_2.29         fs_1.6.6               ragg_1.5.0            
##  [10] vctrs_0.6.5            ROCR_1.0-11            spatstat.explore_3.5-3
##  [13] RCurl_1.98-1.17        htmltools_0.5.8.1      sass_0.4.10           
##  [16] sctransform_0.4.2      parallelly_1.45.1      KernSmooth_2.23-26    
##  [19] bslib_0.9.0            htmlwidgets_1.6.4      desc_1.4.3            
##  [22] ica_1.0-3              plyr_1.8.9             plotly_4.11.0         
##  [25] zoo_1.8-14             cachem_1.1.0           igraph_2.1.4          
##  [28] mime_0.13              lifecycle_1.0.4        pkgconfig_2.0.3       
##  [31] Matrix_1.7-4           R6_2.6.1               fastmap_1.2.0         
##  [34] fitdistrplus_1.2-4     future_1.67.0          shiny_1.11.1          
##  [37] digest_0.6.37          patchwork_1.3.2        tensor_1.5.1          
##  [40] RSpectra_0.16-2        irlba_2.3.5.1          textshaping_1.0.3     
##  [43] labeling_0.4.3         progressr_0.16.0       spatstat.sparse_3.1-0 
##  [46] httr_1.4.7             polyclip_1.10-7        abind_1.4-8           
##  [49] compiler_4.4.1         proxy_0.4-27           withr_3.0.2           
##  [52] S7_0.2.0               tiff_0.1-12            DBI_1.2.3             
##  [55] fastDummies_1.7.5      MASS_7.3-65            classInt_0.4-11       
##  [58] units_0.8-7            tools_4.4.1            lmtest_0.9-40         
##  [61] httpuv_1.6.16          future.apply_1.20.0    goftest_1.2-3         
##  [64] glue_1.8.0             nlme_3.1-168           EBImage_4.48.0        
##  [67] promises_1.3.3         sf_1.0-21              grid_4.4.1            
##  [70] Rtsne_0.17             cluster_2.1.8.1        reshape2_1.4.4        
##  [73] generics_0.1.4         gtable_0.3.6           spatstat.data_3.1-8   
##  [76] class_7.3-23           tidyr_1.3.1            data.table_1.17.8     
##  [79] xml2_1.4.0             spatstat.geom_3.6-0    RcppAnnoy_0.0.22      
##  [82] RANN_2.6.2             pillar_1.11.1          stringr_1.5.2         
##  [85] spam_2.11-1            RcppHNSW_0.6.0         limma_3.62.2          
##  [88] later_1.4.4            splines_4.4.1          lattice_0.22-7        
##  [91] survival_3.8-3         deldir_2.0-4           tidyselect_1.2.1      
##  [94] CardinalIO_1.4.0       locfit_1.5-9.12        miniUI_0.1.2          
##  [97] pbapply_1.7-4          knitr_1.50             gridExtra_2.3         
## [100] matter_2.8.0           svglite_2.2.1          scattermore_1.2       
## [103] xfun_0.53              Biobase_2.66.0         statmod_1.5.0         
## [106] matrixStats_1.5.0      fftwtools_0.9-11       stringi_1.8.7         
## [109] lazyeval_0.2.2         yaml_2.3.10            kableExtra_1.4.0      
## [112] evaluate_1.0.5         codetools_0.2-20       tibble_3.3.0          
## [115] cli_3.6.5              ontologyIndex_2.12     uwot_0.2.3            
## [118] xtable_1.8-4           reticulate_1.43.0      systemfonts_1.2.3     
## [121] jquerylib_0.1.4        Rcpp_1.1.0             globals_0.18.0        
## [124] spatstat.random_3.4-2  zeallot_0.2.0          png_0.1-8             
## [127] spatstat.univar_3.1-4  parallel_4.4.1         pkgdown_2.1.3         
## [130] dotCall64_1.2          jpeg_0.1-11            bitops_1.0-9          
## [133] listenv_0.9.1          e1071_1.7-16           scales_1.4.0          
## [136] ggridges_0.5.7         purrr_1.1.0            rlang_1.1.6           
## [139] cowplot_1.2.0          shinyjs_2.1.0