48

I want to pass arrange() {dplyr} a vector of variable names to sort on. Usually I just type in the variables I want, but I'm trying to make a function where the sorting variables can be input as a function parameter.

df <- structure(list(var1 = c(1L, 2L, 2L, 3L, 1L, 1L, 3L, 2L, 4L, 4L
  ), var2 = structure(c(10L, 1L, 8L, 3L, 5L, 4L, 7L, 9L, 2L, 6L
  ), .Label = c("b", "c", "f", "h", "i", "o", "s", "t", "w", "x"
  ), class = "factor"), var3 = c(7L, 5L, 5L, 8L, 5L, 8L, 6L, 7L, 
  5L, 8L), var4 = structure(c(8L, 5L, 1L, 4L, 7L, 4L, 3L, 6L, 9L, 
  2L), .Label = c("b", "c", "d", "e", "f", "h", "i", "w", "y"), 
  class = "factor")), .Names = c("var1", "var2", "var3", "var4"), 
  row.names = c(NA, -10L), class = "data.frame")

# this is the normal way to arrange df with dplyr
df %>% arrange(var3, var4)

# but none of these (below) work for passing a vector of variables
vector_of_vars <- c("var3", "var4")
df %>% arrange(vector_of_vars)
df %>% arrange(get(vector_of_vars))
df %>% arrange(eval(parse(text = paste(vector_of_vars, collapse = ", "))))
smci
  • 32,567
  • 20
  • 113
  • 146
rsoren
  • 4,036
  • 3
  • 26
  • 37
  • 3
    Imo, use of %>% should be saved for chaining, as it's pretty ugly... (for single actions <- or = works just fine... – Kevin Jan 29 '15 at 01:15

6 Answers6

36

Hadley hasn't made this obvious in the help file--only in his NSE vignette. The versions of the functions followed by underscores use standard evaluation, so you pass them vectors of strings and the like.

If I understand your problem correctly, you can just replace arrange() with arrange_() and it will work.

Specifically, pass the vector of strings as the .dots argument when you do it.

> df %>% arrange_(.dots=c("var1","var3"))
   var1 var2 var3 var4
1     1    i    5    i
2     1    x    7    w
3     1    h    8    e
4     2    b    5    f
5     2    t    5    b
6     2    w    7    h
7     3    s    6    d
8     3    f    8    e
9     4    c    5    y
10    4    o    8    c

========== Update March 2018 ==============

Using the standard evaluation versions in dplyr as I have shown here is now considered deprecated. You can read Hadley's programming vignette for the new way. Basically you will use !! to unquote one variable or !!! to unquote a vector of variables inside of arrange().

When you pass those columns, if they are bare, quote them using quo() for one variable or quos() for a vector. Don't use quotation marks. See the answer by Akrun.

If your columns are already strings, then make them names using rlang::sym() for a single column or rlang::syms() for a vector. See the answer by Christos. You can also use as.name() for a single column. Unfortunately as of this writing, the information on how to use rlang::sym() has not yet made it into the vignette I link to above (eventually it will be in the section on "variadic quasiquotation" according to his draft).

farnsy
  • 2,282
  • 19
  • 22
  • 2
    I was thinking this as well, but if you do `df %>% arrange_(vector_of_vars)`, it seems to ignore the second element and sorts only on the first element. However, if you do `df %>% arrange_(vector_of_vars[1], vector_of_vars[2])`, then it sorts on both values. I assume there's a more elegant approach than the second method, but I'm not sure what it is. – eipi10 Oct 21 '14 at 23:07
  • ```arrange_()``` does seem to ignore the second column. @eipi10 your solution would work, but the problem is that there can be arbitrary number of elements in ```vector_of_vars```. – rsoren Oct 21 '14 at 23:15
  • I wasn't claiming that my second method was a good one. I was just trying to bound the problem of figuring out why the seemingly "natural" approach doesn't work and also provide a temporary, if inelegant, solution. Hopefully @Hadley will jump in and edify us. – eipi10 Oct 21 '14 at 23:20
  • 5
    Ah, this works: ```df %>% arrange_(.dots = vector_of_vars)```. farnsy, if you make this change I'll give you credit for the answer – rsoren Oct 21 '14 at 23:28
  • 2
    @farnsy What if you want to sort it in descending order? how to pass the desc parameter? I haven't figured out! – jpmarindiaz Sep 22 '15 at 23:43
  • 5
    `vector_of_vars <- c("desc(var3)", "var4");df %>% arrange_(.dots=vector_of_vars)` – farnsy Sep 24 '15 at 01:32
  • @farnsy this didn't work for me, see [my post](http://stackoverflow.com/questions/38052325). – zx8754 Jun 27 '16 at 11:19
22

In the quosures spirit:

df %>% arrange(!!! rlang::syms(c("var1", "var3")))

For single variable, it would look like:

df %>% arrange(!! rlang::sym(c("var1")))
JelenaČuklina
  • 3,574
  • 2
  • 22
  • 35
Christos
  • 805
  • 8
  • 25
20

In the new version (soon to be released 0.6.0 of dplyr) we can make use of the quosures

library(dplyr)
vector_of_vars <- quos(var1, var3)
df %>%
    arrange(!!! vector_of_vars)
#   var1 var2 var3 var4
#1     1    i    5    i
#2     1    x    7    w
#3     1    h    8    e
#4     2    b    5    f
#5     2    t    5    b
#6     2    w    7    h
#7     3    s    6    d
#8     3    f    8    e
#9     4    c    5    y
#10    4    o    8    c

When there are more than one variable, we use quos and for a single variable it is quo. The quos will return a list of quoted variables and inside arrange, we unquote the list using !!! for evaluation

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 9
    ... which is now deprecated again... `1: Unquoting language objects with '!!!' is soft-deprecated as of rlang 0.3.0. Please use '!!' instead.` It's mindblowing (to stay polite) how many functions are constantly being deprecated in the tidyverse... I'll go back to Base R for my long term code I think... – Bastien May 27 '19 at 17:53
17

I think now you can just use dplyr::arrange_at().

library(dplyr)

### original
head(iris)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

### arranged
iris %>% 
  arrange_at(c("Sepal.Length", "Sepal.Width")) %>% 
  head()
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          4.3         3.0          1.1         0.1  setosa
# 2          4.4         2.9          1.4         0.2  setosa
# 3          4.4         3.0          1.3         0.2  setosa
# 4          4.4         3.2          1.3         0.2  setosa
# 5          4.5         2.3          1.3         0.3  setosa
# 6          4.6         3.1          1.5         0.2  setosa
Cecilia Lee
  • 775
  • 1
  • 11
  • 20
  • This worked for me. Holy crap there are so many syntax changes over the years for something so fundamental. – SplitInf Apr 02 '23 at 22:02
3

Try this:

df %>% do(do.call(arrange_, . %>% list(.dots = vector_of_vars)))

and actually this can be written more simply as:

df %>% arrange_(.dots = vector_of_vars)

although at this point I think its the same as farnsy's implied solution.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
3

It's a little dense, but I think the best approach now is to use across() along with a tidyselect function, e.g. all_of():

df <- structure(list(var1 = c(1L, 2L, 2L, 3L, 1L, 1L, 3L, 2L, 4L, 4L
  ), var2 = structure(c(10L, 1L, 8L, 3L, 5L, 4L, 7L, 9L, 2L, 6L
  ), .Label = c("b", "c", "f", "h", "i", "o", "s", "t", "w", "x"
  ), class = "factor"), var3 = c(7L, 5L, 5L, 8L, 5L, 8L, 6L, 7L, 
  5L, 8L), var4 = structure(c(8L, 5L, 1L, 4L, 7L, 4L, 3L, 6L, 9L, 
  2L), .Label = c("b", "c", "d", "e", "f", "h", "i", "w", "y"), 
  class = "factor")), .Names = c("var1", "var2", "var3", "var4"), 
  row.names = c(NA, -10L), class = "data.frame")

vector_of_vars <- c("var3", "var4")

df %>% arrange(across(all_of(vector_of_vars)))