6

Is there any way to flip wide my data without first specifying a variable against which to be flipped? The logical default seems to me to be the in-group index.

For example,

DT <- data.table(id = rep(6:10, each = 3), var = rnorm(15))
DT
#     id         var
#  1:  6  1.58293930
#  2:  6  0.44234019
#  3:  6 -0.06576521
#  4:  7 -0.65124980
#  5:  7  0.88371933
#  6:  7 -1.94998135
#  7:  8 -1.95746466
#  8:  8 -0.50978195
#  9:  8 -0.40450447
# 10:  9 -0.61097399
# 11:  9 -0.92335213
# 12:  9 -0.19881983
# 13: 10  0.13022635
# 14: 10 -0.30141200
# 15: 10  0.78355188

What I want is basically, for each id, each value of var in a different column (and NAs if there's any id with fewer var values associated), which can be done like so:

DT[ , I := 1:.N, by = id]
dcast(DT, id ~ I, value.var = "var")
#    id          1          2           3
# 1:  6  1.5829393  0.4423402 -0.06576521
# 2:  7 -0.6512498  0.8837193 -1.94998135
# 3:  8 -1.9574647 -0.5097820 -0.40450447
# 4:  9 -0.6109740 -0.9233521 -0.19881983
# 5: 10  0.1302263 -0.3014120  0.78355188

However, it would be more convenient if I didn't have to define I first, like so:

dcast(DT, id~ ., value.var = "var")

But this doesn't work:

Aggregate function missing, defaulting to 'length'

#    id .
# 1:  6 3
# 2:  7 3
# 3:  8 3
# 4:  9 3
# 5: 10 3

Is there perhaps an aggregating function that I could pass to get the desired effect?

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198

2 Answers2

4

This is now possible using the rowid function:

dcast(DT, id ~ rowid(id), value.var = "var")
#    id          1          2          3
# 1:  6  1.1050942  0.1271620  1.3051373
# 2:  7 -0.5441056 -0.6866828 -0.8083762
# 3:  8 -0.6812820 -1.1934716 -1.3913903
# 4:  9 -0.3462497 -0.8229276 -1.0884394
# 5: 10 -0.4600681  0.6173795 -1.0125658

See ?rowid for more options, examples, and explanation.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
2

If your table is full, you can use the following technique:

dt<-data.table(id=rep(6:10,each=3), var=rnorm(15))
dt[, as.list(setNames(var, 1:length(var))), id]
#    id       1       2        3
# 1:  6 -1.6187  0.9227 -0.12284
# 2:  7  0.9898  0.3489  1.21193
# 3:  8 -1.5968  1.3802  0.09284
# 4:  9 -1.9527 -0.6347  1.11014
# 5: 10  0.6093 -1.0609 -0.66372

This trick was used a few days back on a data.table question. If someone follows the feed and can recall the post, feel free to edit.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Clayton Stanley
  • 7,513
  • 9
  • 32
  • 46
  • sadly the group Ns are different, but that's a good idea. can probably be combined with a self join that would fill out the `NA`s we need. – MichaelChirico May 20 '15 at 01:32