5

Consider a data.table below:

DT <- data.table(a=c(1,2,4,3,5), b=c(3:5,NA,2), c=c(2,1,NA,NA,3)) 
DT
   a  b  c
1: 1  3  2
2: 2  4  1
3: 4  5 NA
4: 3 NA NA
5: 5  2  3

I want to sort the rows based on 3rd column and then 1st column. I can do it using:

DT[order(DT[,3],DT[,1])]

   a  b  c
1: 2  4  1
2: 1  3  2
3: 5  2  3
4: 3 NA NA
5: 4  5 NA

But, if DT has many columns and lets say I want to sort them based on 1st to i-th columns, then it won't be that efficient to write it as:

DT[order(DT[,1], DT[,2], DT[,3], ... DT[,i])]

Instead, I'd like to provide the column indices as a vector (see below):

DT[order(DT[,c(1:i)])]

But, it doesn't work the way I expect and the output is:

DT[order(DT[,c(3,1)])]

     a  b  c
 1:  2  4  1
 2: NA NA NA
 3:  1  3  2
 4: NA NA NA
 5:  5  2  3
 6: NA NA NA
 7: NA NA NA
 8: NA NA NA
 9:  4  5 NA
10:  3 NA NA

Any advise on how I can fix that? Thanks!

Mahmoud
  • 381
  • 4
  • 12
  • 2
    Check `setorderv` as decribed in the answer here: [How to sort a dataframe by multiple column(s)?](https://stackoverflow.com/a/35233592/1851712). For a range of columns e.g. `cols <- names(iris)[2:4]`; `setorderv(iris, cols)` – Henrik Oct 26 '18 at 18:05
  • Thanks @Henrik! setorderv worked for me: col<-names(DT)[c(3,1)] and then setorderv(DT,col,c(1,1)) However, it puts NA values on top. Do we have the option to decide they end up being on top or bottom? – Mahmoud Oct 26 '18 at 21:10
  • Have you taken your time to read `?setorderv`? – Henrik Oct 26 '18 at 22:23
  • Oh, I see! there is an option to say 'na.last=TRUE' Thanks @Henrik! – Mahmoud Oct 26 '18 at 23:18

1 Answers1

6

We can use do.call with order after specifying the .SDcols

DT[DT[,do.call(order, .SD), .SDcols = c(3, 1)]]
#   a  b  c
#1: 2  4  1
#2: 1  3  2
#3: 5  2  3
#4: 3 NA NA
#5: 4  5 NA
akrun
  • 874,273
  • 37
  • 540
  • 662