Rearrange DT by id, so that all different values are put in columns next to each other

Question

How can I rearrange a data.table by id, so that the columns within the same id are put next to each other? The following code as a short example of input and desired output

require(data.table)
set.seed(456)
DT <- data.table(id = c(1,1,1, 2,2, 3,3, 4,4,4, 5),
                 X  = round(runif(11, 0, 1),2),
                 Y  = round(runif(11, 5, 10)),
                 A  = c(rep(9, 3), rep(10, 2), rep(11, 2), rep(12, 3), 13))

Input

DT
#    id    X Y  A
# 1:  1 0.09 6  9
# 2:  1 0.21 9  9
# 3:  1 0.73 9  9
# 4:  2 0.85 8 10
# 5:  2 0.79 8 10
# 6:  3 0.33 9 11
# 7:  3 0.08 7 11
# 8:  4 0.29 9 12
# 9:  4 0.24 6 12
# 10: 4 0.39 6 12
# 11: 5 0.37 9 13
# ...

I want something like this: for every subset by id: After all constant variables (within each id, here: A and id), put all different variables (within each id, here: X- and Y-columns) in the same row by creating new columns (X -> X1, X2, X3, Y -> Y1, Y2, Y3) The occurences by id are different and vary in my case between 1 and 3, so the missing values for all ids with occurences less than 3 should be filled with NA.

Desired output

# id  A   X1 Y1   X2 Y2   X3 Y3
#  1  9 0.09  6 0.21  9 0.73  9
#  2 10 0.85  8 0.79  8   NA NA
#  3 11 0.33  9 0.08  7   NA NA
#  4 12 0.29  9 0.24  6 0.39  6
#  5 13 0.37  9   NA NA   NA NA
# ...

It is easy to get the number of items in every subset with DT[,.N, by = id], but I struggle with the next steps. Of course the number of columns and rows is much larger, so that I would prefer solutions with less naming of the columns (but of course somewhere the constant and different columns should be defined).

Well, the solution my be a duplicate, except for the reason that I would never find it by not knowing ``dcast``. — Phann, Feb 03 '17 at 09:36
@Jaap I think the OP also asked for rearranging the columns, in that case, it would be diffficult from the link you duped. — akrun, Feb 03 '17 at 09:37
And having a column ``A`` (constant) beside ``id`` is another difference. — Phann, Feb 03 '17 at 09:38

score 2 · Accepted Answer · answered Feb 03 '17 at 09:15

2

We can use dcast

library(data.table)
res <- dcast(DT, id~rowid(id), value.var = c("X", "Y", "A"))
res[, c("id", names(res)[-1][order(sub("\\D+", "", names(res)[-1]))]), with = FALSE]

answered Feb 03 '17 at 09:15

akrun

874,273
37
540
662

Very nice solution! I get the oppinion that with ``data.table`` every problem is to be solved in 2 lines ... . However, I have an improvement of your solution to better fit my desired result: ``dcast(DT, id+A~rowid(id), value.var = c("X", "Y"))`` and consequently: ``res[, c("id", "A", names(res)[-c(1,2)][order(sub("\\D+", "", names(res)[-c(1,2)]))]), with = FALSE]`` – Phann Feb 03 '17 at 09:29
@Phann Thank you for the comments. – akrun Feb 03 '17 at 09:36

Rearrange DT by id, so that all different values are put in columns next to each other

1 Answers1