Aggregate matrix by colname partial matching, and then recode based on condition

Question

I have a dataset generated as follows:

cn <- c("Cop-1", "Cop-2", "LEW-1", "Lew-3", "Cop-3", "SHR-2", "LEW-2", 
"SHRP-3", "SHRP-1")
rn <- paste(rep("Gene_", 4), 1:4, sep = "")
start <- matrix(nrow = 4, ncol = 9)
rownames(start) <- rn
colnames(start) <- cn
start[1, ] <- c(0, .01, 3, 4, 0.001, 11, 5, 15, 46)
start[2, ] <- c(0, .01, 3, 4, 0.001, 11, 5, 15, 46)
start[3, ] <- c(0, .01, 3, 4, 0.001, 11, 5, 15, 46)
start[4, ] <- c(0, .01, 3, 4, 0.001, 11, 5, 15, 46)

And looks like this:

       Cop-1 Cop-2 LEW-1 Lew-3 Cop-3 SHR-2 LEW-2 SHRP-3 SHRP-1
Gene_1     0  0.01     3     4 0.001    11     5     15     46
Gene_2     0  0.01     3     4 0.001    11     5     15     46
Gene_3     0  0.01     3     4 0.001    11     5     15     46
Gene_4     0  0.01     3     4 0.001    11     5     15     46`

I would like to scan this dataset and get a new recoded dataset based on the following criteria:

If values for Gene_n are >= 10 for all replicates (e.g. SHRP-1, 2 and 3), then in a new matrix the value for SHRP for Gene_n will be 1. If values for Gene_n are < 1 for all replicates (e.g. Cop-1, 2 and 3), then in a new matrix the value for Cop for Gene_n will be 0. Any other scenario (e.g. LEW-1, 2, and 3) gets assigned 0.5.

The final dataset should look like this:

cn2 <- c("Cop", "LEW", "SHRP")
end <- matrix(nrow = 4, ncol = 3)
colnames(end) <- cn2
rownames(end) <- rn
end[1, ] <- c(0, 0.5, 1)
end[2, ] <- c(0, 0.5, 1)
end[3, ] <- c(0, 0.5, 1)
end[4, ] <- c(0, 0.5, 1) 

       Cop LEW SHRP
Gene_1   0 0.5    1
Gene_2   0 0.5    1
Gene_3   0 0.5    1
Gene_4   0 0.5    1

Thank you for your assistance. I have tried playing with the split function, and with dplyr, but have not been able to get the desired result. I found this question by searching, and ot (Split data frame based on column name pattern and then rebind into 1 data frame) gets me close, but again, not quite to the result I need.

Thank you for your help.

CLedbetter · Accepted Answer · 2017-11-22T20:01:58.473

0

cn2 <- c("Cop", "LEW", "SHRP")
end <- sapply(cn2, function(x){
  cols <- grep(paste0('^', x, '-', '[1-9]+'), colnames(start))
  apply(start[, cols], MARGIN =1, function(y) {
    if(all(y >= 10, na.rm = T)) return(1)
    if(all(y <1, na.rm = T)) return(0)
    return(0.5)
  })
})

rownames(end) <- rn

       Cop LEW SHRP
Gene_1   0 0.5    1
Gene_2   0 0.5    1
Gene_3   0 0.5    1
Gene_4   0 0.5    1

edited Nov 22 '17 at 20:01

answered Nov 22 '17 at 19:53

CLedbetter

96
5

Excellent. I just had to change the x in if(all(x <1, na.rm = T)) return(0) to a y so that it looks like if(all(y <1, na.rm = T)) return(0). Works great! – Nov 22 '17 at 20:01

Aggregate matrix by colname partial matching, and then recode based on condition

1 Answers1