-1

I am seeking some advice on how to do the following task:

I am analyzing a single-cell RNAseq dataset. I have my normalized expression data in a table ( each column has an unique cell ID, each row is a gene).

I also have an annotation matrix where I have the information of each cell (each row is a cell ID, each column is a piece of info (such as patient ID, site,etc.)

For downstream analyses, I would like to have different grouping based on the info available in the annotation matrix. Do you guys have any suggestion how I might be able to do that????

For example, I have this

expression_matrix<-matrix(c(1:4), nrow = 4,ncol =4, dimnames = list(c("gene1", "gene2", "gene3", "gene4"),c("cell1","cell2","cell3","cell4")))

annotation_matrix<-matrix(c("1526","1788", "1526","1788","controller","noncontroller","controller","noncontroller","LN","PB","LN","PB"), nrow = 4,ncol =3, dimnames = list(c("cell1","cell2","cell3","cell4"),c("ID","Status","Site")))

I want to group based on "site" so that I can combine cell 1 and 3 in one group and cell2 and cell4 in another group. How do I use match the info from the annotation matrix to the expression_matrix?

Say, I want to compare between controller and non-controller so I need to somehow match the cell ID in the normalized_expression table to the patient group info available in the annotation matrix

Son nguyen
  • 29
  • 4
  • 3
    It is helpful if you can provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), with both a sample of your data and code for what what you've tried so far. – austensen Sep 11 '17 at 20:56

1 Answers1

1
expression_matrix<-matrix(c(1:4), nrow = 4,ncol =4, dimnames = list(c("gene1", "gene2", "gene3", "gene4"),c("cell1","cell2","cell3","cell4")))

#       cell1 cell2 cell3 cell4
# gene1     1     1     1     1
# gene2     2     2     2     2
# gene3     3     3     3     3
# gene4     4     4     4     4

annotation_matrix<-matrix(c("1526","1788", "1526","1788","controller","noncontroller","controller","noncontroller","LN","PB","LN","PB"), nrow = 4,ncol =3, dimnames = list(c("cell1","cell2","cell3","cell4"),c("ID","Status","Site")))

#       ID     Status          Site
# cell1 "1526" "controller"    "LN"
# cell2 "1788" "noncontroller" "PB"
# cell3 "1526" "controller"    "LN"
# cell4 "1788" "noncontroller" "PB"

Let's harmonize those

library(dplyr)

expression_df <- expression_matrix %>%
  as.data.frame(stringsAsFactor=F) %>%
  mutate(gene = rownames(.)) %>%
  gather(cell,value,-gene)

#     gene  cell value
# 1  gene1 cell1     1
# 2  gene2 cell1     2
# 3  gene3 cell1     3
# 4  gene4 cell1     4
# 5  gene1 cell2     1
# 6  gene2 cell2     2
# 7  gene3 cell2     3
# 8  gene4 cell2     4
# 9  gene1 cell3     1
# 10 gene2 cell3     2
# 11 gene3 cell3     3
# 12 gene4 cell3     4
# 13 gene1 cell4     1
# 14 gene2 cell4     2
# 15 gene3 cell4     3
# 16 gene4 cell4     4

annotation_df <- annotation_matrix %>%
  as.data.frame(stringsAsFactor=F) %>%
  mutate(cell = rownames(.))

#     ID        Status Site  cell
# 1 1526    controller   LN cell1
# 2 1788 noncontroller   PB cell2
# 3 1526    controller   LN cell3
# 4 1788 noncontroller   PB cell4

And now you can easily filter, merge, spread as you wish

example1 <- annotation_df %>%
  filter(Site == "LN") %>%
  inner_join(expression_df)

#     ID     Status Site  cell  gene value
# 1 1526 controller   LN cell1 gene1     1
# 2 1526 controller   LN cell1 gene2     2
# 3 1526 controller   LN cell1 gene3     3
# 4 1526 controller   LN cell1 gene4     4
# 5 1526 controller   LN cell3 gene1     1
# 6 1526 controller   LN cell3 gene2     2
# 7 1526 controller   LN cell3 gene3     3
# 8 1526 controller   LN cell3 gene4     4

example2 <- example1 %>%
  spread(gene,value)

#     ID     Status Site  cell gene1 gene2 gene3 gene4
# 1 1526 controller   LN cell1     1     2     3     4
# 2 1526 controller   LN cell3     1     2     3     4
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167