5

What is the data.table way of sorting values within each row? I can easily write a loop which does the sorting row by row, but I suppose it's not a very smart way of doing things.

Example:

Let's have a data.table like:

df = data.table(ID = c('a', 'b', 'c', 'd', 'e', 'f'),
                v1 = c(1,2,1,3,4,5),
                v2 = c(2,3,6,1,0,2),
                v3 = c(0,0,1,2,3,5))

I can sort this using a for loop like so:

for (i in 1:nrow(df))
{
  df[i, 2:4] = sort(df[i, 2:4], decreasing = T)
}

And it gives the intended result of:

   ID v1 v2 v3
1:  a  2  1  0
2:  b  3  2  0
3:  c  6  1  1
4:  d  3  2  1
5:  e  4  3  0
6:  f  5  5  2

But it seems to be very slow way of doing things.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
ira
  • 2,542
  • 2
  • 22
  • 36
  • A `data.frame` (and therefore `data.table` and `tibble`) is really meant to deal with *columns* of information; migration of data across columns is counter to their efficiencies and design. Can it be done? Sure, but you're working outside of how frames are structured. If it were a `matrix` it would be much more straight-forward, since it works as efficiently row-wise as it does column-wise. – r2evans Mar 12 '19 at 07:00
  • 2
    Possible duplicte, related posts: https://stackoverflow.com/q/6063881/680068 and https://stackoverflow.com/q/35891340/680068 – zx8754 Mar 12 '19 at 07:00
  • Let us know if akrun's data.table solution works for you in the links, then we could close this post as duplicate. – zx8754 Mar 12 '19 at 07:06
  • Oh, I missed the akrun's answer, saw just the first question you linked to before i asked my question. Awesome, thank you. Do you have any advice on the efficiency of the solution? i.e. is it better to convert the object to matrix/data.frame and then sort or use akrun's solution for `data.table`? – ira Mar 12 '19 at 07:27
  • 1
    @zx8754 Akrun's solution works. should i close or delete completely? – ira Mar 12 '19 at 07:35
  • 1
    Please do not delete, it is good to have duplicated posts. – zx8754 Mar 12 '19 at 07:37
  • @zx8754 Actually, it feels like akrun's solution sorts by a column name (which is a string), so it will not work as is for a table with a lot of columns. It will produce order of columns like, say: `Col1 Col10 Col100 Col101 Col102 Col103 Col104 Col105 Col106 Col107 Col108 Col109` and then continue `Col11 Col110 Col111 Col112 Col113 Col114 Col115 Col116 Col117 Col118 Col119 Col12 Col120 Col121`. But the order of values in row is thus not correct. It can be fixed by not using the characters in front of column number. – ira Mar 12 '19 at 10:45
  • Feel free to add an answer there **or** suggest edit to akrun's answer. – zx8754 Mar 12 '19 at 10:48
  • Thanks for the tip. I suggested an edit to the original answer. – ira Mar 12 '19 at 11:22
  • 1
    For future readers: this is what akrun's answer looks like for the above dataset: `dcast(melt(df, id.var='ID')[order(-value), .SD, ID][, N := 1:.N , .(ID)], ID~N, value.var=c("value"))` – ismirsehregal Nov 29 '19 at 12:15
  • 1
    This question is different from that asked in https://stackoverflow.com/questions/35891340/row-wise-sorting-in-r although Akrun's answer includes this one. In this question, is it asked how to sort by using "data.table". It is important in terms of the google search key. You should not close the questions to the answers. – ibilgen Feb 26 '20 at 15:21

1 Answers1

2

Do you have to use data.table? What about a base R apply with MARGIN = 1 approach?

df <- as.data.frame(df)
df[-1] <- t(apply(df[-1], 1, function(x) sort(x, decreasing = T)))
df
#  ID v1 v2 v3
#1  a  2  1  0
#2  b  3  2  0
#3  c  6  1  1
#4  d  3  2  1
#5  e  4  3  0
#6  f  5  5  2
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • 4
    They want data.table, if it was dataframe then I'd close as duplicate. – zx8754 Mar 12 '19 at 07:01
  • 1
    See akrun's answer in the links, it has data.table answer, with slightly different requirement, but could be relevant. – zx8754 Mar 12 '19 at 07:02
  • Good avoidance of the up-conversion to `character` when you excluded the first non-numeric column. To the casual reader, it may appear that it was excluded just because it was not intended to be sorted, but if non-numeric and numeric columns are passed together in the first argument of `apply`, they will all end up `character`, so (1) follow-up data will be different classes, and (2) sorts will be by alpha, not numeric, likely not what you want. – r2evans Mar 12 '19 at 07:02
  • Ah yes @zx8754 you're right; looks like a dupe, akrun's answer gives a nice `data.table` solution. You should close. I'm happy to delete but perhaps these comments+explanations are use useful? – Maurits Evers Mar 12 '19 at 07:05