1

My question is similar to few other sorting/ordering questions but not the same. The question is basically how to sort/order dataframes or datatables in R when the column to sort by is stored in a variable.

Say I have a data frame

#create data frame
df <- data.frame(a=c(2,2,2,2,1,1,3,3,3,3,4,4),
                 b=c("c","c","a","a","a","b","b","d","d","d","e","e"),
                 c=c(123,223,1232,122,1232,345,243,456,5676,34,233,111),
                 stringsAsFactors=F)

There are numerous ways to order the dataframe. Some of the base approaches are:

#ordering dataframe by column 1
df[with(df,order(df[,1])), ]
#ordering dataframe by column name 'a'
df[with(df,order(df[,"a"])), ]

Similarly, with datatables:

library(data.table)
dt <- as.data.table(df)
dt[order(a)]

But, if my column to order by is stored in a variable var, how do I use that?

#sort by column 1
var <- 1

#sort by column name "a"
var <- "a"

Taking a step further, how do I sort by multiple columns?

#sort by columns 1 and 2
var1 <- 1
var2 <- 2
mindlessgreen
  • 11,059
  • 16
  • 68
  • 113
  • 1
    You could do `var <- "a" ; setorderv(dt, var)` for your second case. Or `var <- 1 ; setorderv(dt, names(dt)[var])` for your first case. Also, why can't you just do `df[order(df[,var]), ]` with `data.frame`s? Your question is quite unclear. – David Arenburg Dec 13 '15 at 18:15
  • 1
    Per your edit, you could do `df[do.call(order, c(df[, c(var1, var2)], df)), ]` or `setorderv(dt, names(dt)[c(var1, var2)])`. But why not storing them in the same vector in the first place? – David Arenburg Dec 13 '15 at 18:21
  • 1
    My guess this is some type of a dupe of this http://stackoverflow.com/questions/28026579/order-dataframe-for-given-columns – David Arenburg Dec 13 '15 at 18:28

1 Answers1

1

try this

df[order(df[[var]]),]

EDIT thanks to David or this if you have multiple conditions

df[order(df[,var]),]

road_to_quantdom
  • 1,341
  • 1
  • 13
  • 20
  • @DavidArenburg this is true, but I thought his original question was just for the column needed being stored in a variable. Your answer is more general – road_to_quantdom Dec 13 '15 at 18:22
  • Can you explain a bit on the difference in using `[]` vs `[[]]`? Also, if my variable has multiple columns, this doesn't seem to work. Eg. `var <- c("a","b")` or `var <- c(1,2)`. – mindlessgreen Dec 13 '15 at 18:41
  • 1
    http://stackoverflow.com/questions/1169456/in-r-what-is-the-difference-between-the-and-notations-for-accessing-the. And I showed how to do this in my second comment – David Arenburg Dec 13 '15 at 18:44
  • 1
    @Roy when you use `df[,]` you are specifying which `[row,col]` whereas with `[[var]]` you are pulling the vector that is `var`. `[[]]` is very similar to `df$a` if you are trying to access the column vector `a`. Whereas `[,]` will pull the rows specified as well and multiple columns if you specify a vector of columns to be pulled – road_to_quantdom Dec 13 '15 at 18:50