4

Take the following example.

library(dplyr)
temp <- data.frame(lapply(1:3, function(i) rnorm(5, 0, 1)))
names(temp) <- paste0("X", 1:3)

temp_each <-
    temp %>%
    mutate_each(funs(mean, median))

Examining the names of temp_each, we see that

> names(temp_each)
[1] "X1"        "X2"        "X3"        "X1_mean"   "X2_mean"   "X3_mean"   "X1_median" "X2_median" "X3_median"

that is, the final columns are in groups of three, always ordered X1, X2, X3 + the function applied.

However, I would like it to look like this

[1] "X1"        "X1_mean"   "X1_median" "X2"        "X2_mean"   "X2_median" "X3"        "X3_mean"   "X3_median"

Does anyone know how to implement this, preferably using dplyr, for a data frame with many many columns and arbitrary column names?

starball
  • 20,030
  • 7
  • 43
  • 238
Alex
  • 15,186
  • 15
  • 73
  • 127
  • 2
    You can just reorder the columns, by sorting the names in alphabetical order? `temp_each[,order(names(temp_each))]` does that achieve what you were after? – chappers Jun 30 '15 at 04:54
  • that's a good interim solution, but that may not necessary preserve the ordering of the original names, if they are not `X1`, `X2`, etc... I reworded the question slightly, thanks – Alex Jun 30 '15 at 04:58
  • 1
    You may want to check `mixedorder` from gtools – Veerendra Gadekar Jun 30 '15 at 05:02
  • As long as you can write some custom function to define the order of the names, sounds good. – smci Jun 30 '15 at 05:16
  • I am particularly struggling to implement this as part of a `select` step. – Alex Jun 30 '15 at 05:22

3 Answers3

5

Here you could use mixedorder from gtools

library(gtools)
temp_each[,mixedorder(colnames(temp_each))]

#           X1    X1_mean  X1_median         X2    X2_mean  X2_median
#1  0.28285115 -0.4369067 0.08556155 -0.9402162 -0.9857593 -0.7676634
#2 -1.29193398 -0.4369067 0.08556155 -0.5442052 -0.9857593 -0.7676634
#3 -1.42261044 -0.4369067 0.08556155 -0.7676634 -0.9857593 -0.7676634
#4  0.16159810 -0.4369067 0.08556155 -2.2270920 -0.9857593 -0.7676634
#5  0.08556155 -0.4369067 0.08556155 -0.4496198 -0.9857593 -0.7676634
#           X3   X3_mean   X3_median
#1  0.04606554 0.0923336 -0.08168136
#2 -0.08168136 0.0923336 -0.08168136
#3  0.90535333 0.0923336 -0.08168136
#4 -0.15699052 0.0923336 -0.08168136
#5 -0.25107897 0.0923336 -0.08168136
Veerendra Gadekar
  • 4,452
  • 19
  • 24
2

With base R you could try this:

> temp_each[order(colnames(temp_each))]
#          X1    X1_mean X1_median         X2   X2_mean  X2_median         X3  X3_mean X3_median
#    1  0.4142743 -0.4389318 -0.285517  1.8662158 0.3534017 -0.2308971  1.3593561 0.478106 0.6306579
#    2 -0.8031115 -0.4389318 -0.285517 -0.2308971 0.3534017 -0.2308971 -0.6160166 0.478106 0.6306579
#    3 -1.8729143 -0.4389318 -0.285517  1.0171626 0.3534017 -0.2308971  0.2634524 0.478106 0.6306579
#    4  0.3526097 -0.4389318 -0.285517 -0.6378480 0.3534017 -0.2308971  0.6306579 0.478106 0.6306579
#    5 -0.2855170 -0.4389318 -0.285517 -0.2476247 0.3534017 -0.2308971  0.7530800 0.478106 0.6306579
RHertel
  • 23,412
  • 5
  • 38
  • 64
  • 1
    this won't produce the desired output if there are columns like X10, X20 ans so on – Veerendra Gadekar Jun 30 '15 at 05:34
  • 1
    I agree. In that case one could add leading zeros in front of the index number for smaller numbers by means of regexp manipulation. While this is possible, your solution seems easier if it can handle those cases automatically. – RHertel Jun 30 '15 at 05:39
1

Thank you everyone for the answers using base R or otherwise.

This is my preferred base R solution.

old_names <- names(temp)
new_names <- unlist(lapply(old_names, function(old_name) paste0(old_name, c("_mean", "_median"))))
temp_each <- temp_each[new_names]

However, I have now worked out how to do it in dplyr using standard evaluation, from this answer: Group by multiple columns in dplyr, using string vector input

It is rather convoluted.

temp <- data.frame(lapply(1:3, function(i) rnorm(5, 0, 1)))
names(temp) <- paste0("X", 1:3)

old_names <- names(temp)
new_names <- unlist(lapply(old_names, function(old_name) paste0(old_name, c("_mean", "_median"))))


temp_each <-
    temp %>%
    mutate_each(funs(mean, median)) %>%
    select_(.dots = lapply(new_names, as.symbol))
Community
  • 1
  • 1
Alex
  • 15,186
  • 15
  • 73
  • 127