-1

I would like to subset an object in R according to the suffixes of the barcodes it contains. These end in '-n' where n is a number from 1 to 6. e.g. AAACCGTGCCCTCA-1, GAACCGTGCCCTCA-2, CATGCGTGCCCTCA-5, etc. I would like all the corresponding information about each barcode to be split accordingly as well. Here is some example code of an object, cds.

grp = sub("[A-Z]*[-]","",cds$barcodes)
group1 = cds[,grp==1]

However, when I view group1, I get

> group1$barcode
factor(0)
7047 Levels: AAACATACCAGTTG-3 AAACATACTATGCG-4 AAACATTGAAGCCT-5 AAACATTGGCGAAG-4 AAACATTGTGAAGA-4     ... TTTGCATGGCCAAT-5

and all the barcodes are still there. I also don't want to substitute the barcodes for the number at the end - I just want a way of telling R to locate a specific barcode by the number it ends in, so I can group them, but to keep the barcodes as they are.

For example, I would like group1$barcodes to look something like this:

group1$barcode
1   AAACCGTGCCCTCA-1
2   AAACGCACACGCAT-1
3   AAACGGCTTCCGAA-1
4   AAAGACGAACCCAA-1
5   AAAGACGACTGTTT-1
6   AAAGAGACAAAGCA-1
7   AAAGATCTGGTAAA-1
8   AAAGCAGAGCAAGG-1
9   AAAGCAGATTATCC-1
10  AAAGCCTGATGACC-1

Many thanks!

Abigail

Abigail575
  • 175
  • 8

1 Answers1

1

Edit:

Use 'suffix' not 'prefix'!

I'd suggest using dplyr:

library(dplyr)
cds %>%
mutate(grp = gsub("([A-Z]*)-([0-9]+)", "\\2", barcodes))

And then to filter single groups:

cds %>%
  mutate(grp = gsub("([A-Z]*)-([0-9]+)", "\\2", barcodes)) %>% 
  filter(grp == 3)
dario
  • 6,415
  • 2
  • 12
  • 26
  • Thanks! I still get this though: > grp = sub("[A-Z]*[-]","",cds$barcodes), > group1 = cds[,grp==1], > group1$barcode factor(0) 7047 Levels: AAACATACCAGTTG-3 AAACATACTATGCG-4 AAACATTGAAGCCT-5 AAACATTGGCGAAG-4 AAACATTGTGAAGA-4 ... TTTGCATGGCCAAT-5. – Abigail575 Feb 14 '20 at 11:42
  • (There should be around 2000 with the -1 suffix) – Abigail575 Feb 14 '20 at 11:43
  • Oh I forgot to run the first part, but now it says: Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "c('CellDataSet', 'ExpressionSet', 'eSet', 'VersionedBiobase', 'Versioned')". I think the class is the problem :-( – Abigail575 Feb 14 '20 at 11:45
  • Without a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) it's kind of difficult to know what you are seeing or where the problem lies... but do you mean the part about `levels`?? – dario Feb 14 '20 at 11:45
  • You can try `as.data.frame(object_that_is_a_different_class` – dario Feb 14 '20 at 11:46
  • OR `cds_df <- data.frame(barcodes = as.character(cds$barcodes))` – dario Feb 14 '20 at 11:47