1

I have a data.frame including five columns as following:

name module m.red m.blue m.green
z    red    0.22  0.1    0.09
g    red    0.98  0.3    0.4
d    green  0.08  0.2    0.87
k    blue   0.2   0.88   0.4
b    blue   0.1   0.9    0.3

I want to reduce this matrix to the following matrix based on the module name of each element:

name module m.module 
z    red    0.22 
g    red    0.98 
d    green  0.87 
k    blue   0.88 
b    blue   0.9 

and then sort the elements of each module based on the m.module column in decreasing order. Can any one help me for doing it?

Thanks

Zaynab
  • 233
  • 3
  • 16

2 Answers2

3

This would be a much easier job to solve if your data were in a "long" format.

Thus, you can try an approach like the following:

library(dplyr)
library(tidyr)

mydf %>%
  gather(var, val, m.red, m.blue, m.green) %>%  ## Make the data long
  separate(var, into = c("m", "var")) %>%       ## Make it easy to match things
  filter(module == var) %>%                     ## Use the matching to filter
  select(name, module, val) %>%                 ## Keep just the columns you want
  arrange(val)                                  ## Sort
#   name module  val
# 1    z    red 0.22
# 2    d  green 0.87
# 3    k   blue 0.88
# 4    b   blue 0.90
# 5    g    red 0.98

Add a group_by somewhere in there if you wanted to sort by group instead of overall.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • 1
    I wonder you could skip the `separate` step and just do `df %>% gather(var, val, m.red:m.green) %>% filter(module == sub("m\\.", "", var))` – David Arenburg Aug 20 '15 at 08:38
2

A possible base R approach:

cols = tail(names(df),3)
mask = sapply(df$module, function(x) grepl(x, cols))

dt = transform(df[,1:2], m.module=t(df[,cols])[mask])
dt[order(dt$m.module, decreasing=T),]
#  name module m.module
#2    g    red     0.98
#4    k   blue     0.90
#3    d   blue     0.88
#5    b  green     0.87
#1    z    red     0.22

Data:

df = structure(list(name = c("z", "g", "d", "k", "b"), module = c("red", 
"red", "green", "blue", "blue"), m.red = c(0.22, 0.98, 0.08, 
0.2, 0.1), m.blue = c(0.1, 0.3, 0.2, 0.88, 0.9), m.green = c(0.09, 
0.4, 0.87, 0.4, 0.3)), .Names = c("name", "module", "m.red", 
"m.blue", "m.green"), class = "data.frame", row.names = c(NA, 
-5L))
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
  • When I tried this solution, it failed because `name` and `module` columns were factors. It worked when I turned theses columns into character. –  Aug 20 '15 at 08:06
  • 1
    I added the data in case. This is not mentionned by the OP but yes the pre-treatment might be needed. I also turned it base `R`,`str_detect` was overkill. – Colonel Beauvel Aug 20 '15 at 08:08
  • Thank you so much, how can do sort of module elements in terms of the value in m.module column? – Zaynab Aug 20 '15 at 08:17
  • If I add also three columns p.m.red , p.m.blue and p.m.green in my dataframe after each previous columns, and I want to add another column of p.m.module in the result, how can I do it? – Zaynab Aug 20 '15 at 08:28
  • sorry this is a different question and this needs clarifications. You can open another topic for that, but try first some experimentations/solutions. By the way, as such the original question is answered, please consider accepting the answer which suits you to close the topic! – Colonel Beauvel Aug 20 '15 at 08:37
  • About ordering, I need to sort elements of each module separately without any change in the ordering of modules. – Zaynab Aug 20 '15 at 09:01
  • The result is incorrect!, the values of m.module for elements of green module and blue module are not correct. How can we correct it? – Zaynab Aug 20 '15 at 10:54