Dynamically sorting columns in dplyr via passing ordered vector with column names to select

Question

I'm using the code below to generate a simple summary table:

# Data
data("mtcars")
# Lib
require(dplyr)
# Summary
mt_sum <- mtcars %>%
  group_by(am) %>%
  summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
  mutate(am = as.character(am)) %>%
  left_join(y = as.data.frame(table(mtcars$am),
                              stringsAsFactors = FALSE),
            by = c("am" = "Var1"))

The code produces the desired results:

> head(mt_sum)
Source: local data frame [2 x 10]

     am mpg_min cyl_min mpg_mean cyl_mean mpg_median cyl_median mpg_max cyl_max  Freq
  (chr)   (dbl)   (dbl)    (dbl)    (dbl)      (dbl)      (dbl)   (dbl)   (dbl) (int)
1     0    10.4       4 17.14737 6.947368       17.3          8    24.4       8    19
2     1    15.0       4 24.39231 5.076923       22.8          4    33.9       8    13

However, I'm not satisfied with the way the columns are ordered. In particular, I would like to:

Order columns by name
Achieve that via select() in dplyr

Desired order

The desired order would look like that:

> names(mt_sum)[order(names(mt_sum))]
 [1] "am"         "cyl_max"    "cyl_mean"   "cyl_median" "cyl_min"    "Freq"       "mpg_max"   
 [8] "mpg_mean"   "mpg_median" "mpg_min"

Attempts

Ideally, I would like to pass names(mt_sum)[order(names(mt_sum))] way of sorting the columns in select(). But the code:

mt_sum <- mtcars %>%
  group_by(am) %>%
  summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
  mutate(am = as.character(am)) %>%
  left_join(y = as.data.frame(table(mtcars$am),
                              stringsAsFactors = FALSE),
            by = c("am" = "Var1")) %>%
  select(names(.)[order(names(.))])

Will return the expected error:

Error: All select() inputs must resolve to integer column positions.
The following do not:
*  names(.)[order(names(.))]

In my real data I'm generating a vast number of summary columns. Hence my question, how can I dynamically pass sorted column names to select() in dplyr so it will understand it and apply to the data.frame at Hand?

My focus is on figuring out a way of passing the dynamically generated column names to select(). I know that I could sort the columns in base or by typing names, as discussed here.

talat · Answer 1 · 2015-12-03T13:31:48.023

11

All you need is just:

mt_sum %>% select(order(names(.)))
#Source: local data frame [2 x 10]
#
#     am cyl_max cyl_mean cyl_median cyl_min  Freq mpg_max mpg_mean mpg_median mpg_min
#  (chr)   (dbl)    (dbl)      (dbl)   (dbl) (int)   (dbl)    (dbl)      (dbl)   (dbl)
#1     0       8 6.947368          8       4    19    24.4 17.14737       17.3    10.4
#2     1       8 5.076923          4       4    13    33.9 24.39231       22.8    15.0

It works, because order returns integer column positions, as required by select.

edited Dec 03 '15 at 13:31

answered Dec 03 '15 at 13:28

talat

68,970
21
126
157

1

Thanks very much, neat solution and works really well. – Konrad Dec 03 '15 at 13:30
If you're calculating summaries for grouped variables, you can remove these from the ordering and add them at the beginning. Like `select(group_var, order(names(.)[-1]))` – hannes101 Mar 05 '19 at 14:54

score 9 · Accepted Answer · answered Dec 03 '15 at 13:25

9

You're definitely on the right path.

mt_sum <- mtcars %>%
  group_by(am) %>%
  summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
  mutate(am = as.character(am)) %>%
  left_join(y = as.data.frame(table(mtcars$am),
                              stringsAsFactors = FALSE),
            by = c("am" = "Var1")) %>%
  .[, names(.)[order(names(.))]]

answered Dec 03 '15 at 13:25

maloneypatr

3,562
4
23
33

Amazing! So no `select()` in the whole thing and that's it! – Konrad Dec 03 '15 at 13:28

Dynamically sorting columns in dplyr via passing ordered vector with column names to select

Desired order

Attempts

2 Answers2

Linked