4

I am seeing a peculiar behaviour from the R Seurat package, when trying to subset objects to specific sets of cells.

So, say that I generate three sets of random cell names from a Seurat object using sample

library(Seurat)

set.seed(12345)

ten_cells_id <- sample(Cells(pbmc_small), 10)
other_ten_ids <- sample(Cells(pbmc_small), 10)
and_other_ten <- sample(Cells(pbmc_small), 10)

I can now subset the object using [] and print the cell tags

Cells(pbmc_small[, ten_cells_id], pt.size=3)
Cells(pbmc_small[, other_ten_ids], pt.size=3)
Cells(pbmc_small[, and_other_ten], pt.size=3)

No surprises here; it yields three different things as expected.

> Cells(pbmc_small[, ten_cells_id], pt.size=3)
 [1] "CATGAGACACGGGA" "CGTAGCCTGTATGC" "ACTCGCACGAAAGT" "CTAGGTGATGGTTG" "TTACGTACGTTCAG" "CATGGCCTGTGCAT"
 [7] "ACAGGTACTGGTGT" "AATGTTGACAGTCA" "GATAGAGAAGGGTG" "CATTACACCAACTG"
> Cells(pbmc_small[, other_ten_ids], pt.size=3)
 [1] "GGCATATGCTTATC" "ACAGGTACTGGTGT" "CATCAGGATGCACA" "ATGCCAGAACGACT" "GAGTTGTGGTAGCT" "GGCATATGGGGAGT"
 [7] "AGAGATGATCTCGC" "GAACCTGATGAACC" "GATATAACACGCAT" "CATGAGACACGGGA"
> Cells(pbmc_small[, and_other_ten], pt.size=3)
 [1] "GGGTAACTCTAGTG" "TTTAGCTGTACTCT" "TACATCACGCTAAC" "CTAAACCTGTGCAT" "ATACCACTCTAAGC" "CATGCGCTAGTCAC"
 [7] "GATAGAGAAGGGTG" "ATTACCTGCCTTAT" "GCGCATCTTGCTCC" "ACAGGTACTGGTGT"

However, if I do

cells1 <- pbmc_small[, sample(Cells(pbmc_small), 10)]
cells2 <- pbmc_small[, sample(Cells(pbmc_small), 10)]
cells3 <- pbmc_small[, sample(Cells(pbmc_small), 10)]

Cells(cells1)
Cells(cells2)
Cells(cells3)

I get three times the same thing

> Cells(cells1)
 [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
 [7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"
> Cells(cells2)
 [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
 [7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"
> Cells(cells3)
 [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
 [7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"

The values are always the same, independently of the seed I use! I guess that R is somehow resetting the seed each time. This is not an issue with [] as:

a <- 1:100
a[sample(1:100, 10)]
a[sample(1:100, 10)]
a[sample(1:100, 10)]

Returns three different values.

The only thing I can think of is that something strange is happening because Seurat overloads []. Any ideas?

nico
  • 50,859
  • 17
  • 87
  • 112

1 Answers1

2

It looks like this is because [.Seurat() calls subset.Seurat(), which in turn calls WhichCells(). WhichCells() has a seed argument, which defaults to 1. You can override this by setting it to NULL, and thankfully this will also filter through if you pass it to [ like so:

library(Seurat)
#> Attaching SeuratObject
#> Attaching sp

set.seed(12345)

cells1 <- pbmc_small[, sample(Cells(pbmc_small), 10), seed = NULL]
cells2 <- pbmc_small[, sample(Cells(pbmc_small), 10), seed = NULL]
cells3 <- pbmc_small[, sample(Cells(pbmc_small), 10), seed = NULL]

Cells(cells1)
#>  [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA"
#>  [5] "TACAATGATGCTAG" "CATGAGACACGGGA" "GCACTAGACCTTTA" "CGTAGCCTGTATGC"
#>  [9] "TTACCATGAATCGC" "ATAAGTTGGTACGT"
Cells(cells2)
#>  [1] "GTCATACTTCGCCT" "TGGTATCTAAACAG" "ATCATCTGACACCA" "GTTGACGATATCGG"
#>  [5] "GACGCTCTCTCTCG" "AGATATACCCGTAA" "CTTCATGACCGAAT" "CTAACGGAACCGAT"
#>  [9] "TACTCTGAATCGAC" "GCGTAAACACGGTT"
Cells(cells3)
#>  [1] "GTCATACTTCGCCT" "GCTCCATGAGAAGT" "ACAGGTACTGGTGT" "TACATCACGCTAAC"
#>  [5] "CCATCCGATTCGCC" "GACGCTCTCTCTCG" "CTTCATGACCGAAT" "GCGTAAACACGGTT"
#>  [9] "CATTACACCAACTG" "CTTGATTGATCTTC"

Created on 2022-10-17 with reprex v2.0.2

In my opinion this is quite poorly documented, and the behaviour is confusing enough to possibly justify a new issue at the surat-object GitHub.

wurli
  • 2,314
  • 10
  • 17
  • Thank you a lot for this! It was driving me mad! I am writing an issue for SeuratObject – nico Oct 18 '22 at 06:38
  • 1
    I have sent an issue here https://github.com/mojaveazure/seurat-object/issues/62 Actually I just realised that any call to WhichCells overwrites the user-provided seed, which is probably not something that should happen! – nico Oct 18 '22 at 07:23