13
> library(data.table)
> A <- data.table(x = c(1,1,2,2), y = c(1,2,1,2), v = c(0.1,0.2,0.3,0.4))
> A
   x y   v
1: 1 1 0.1
2: 1 2 0.2
3: 2 1 0.3
4: 2 2 0.4
> B <- dcast(A, x~y)
Using v as value column: use value.var to override.
> B
  x   1   2
1 1 0.1 0.2
2 2 0.3 0.4

Apparently I can reshape a data.table from long to wide using f.x. dcast of package reshape2. But data.table comes along with an overloaded bracket-operator offering parameters like 'by' and 'group', which make me wonder if it is possible to achieve it using this (to data.table specific functionality)?

Just one random example from the manual:

DT[,lapply(.SD,sum),by=x]

That looks awesome - but I don't fully understand the usage yet.

I neither found a way nor an example for this so maybe it is just not possible maybe it isn't even supposed to be - so, a definite "no, is not possible because ..." is then of course also a valid answer.

Raffael
  • 19,547
  • 15
  • 82
  • 160
  • Please see [here](http://stackoverflow.com/questions/15510566/nested-if-else-statements-over-a-number-of-columns/15511689?noredirect=1#comment21968080_15511689). – Metrics Aug 04 '13 at 21:34
  • Sorry, I don't see how this relates to my question. Also mind that the column categories of B are dynamically depending on A so the number of values of 'y' can differ from case to case. – Raffael Aug 04 '13 at 21:41

3 Answers3

16

I'll pick an example with unequal groups so that it's easier to illustrate for the general case:

A <- data.table(x=c(1,1,1,2,2), y=c(1,2,3,1,2), v=(1:5)/5)
> A
   x y   v
1: 1 1 0.2
2: 1 2 0.4
3: 1 3 0.6
4: 2 1 0.8
5: 2 2 1.0

The first step is to get the number of elements/entries for each group of "x" to be the same. Here, for x=1 there are 3 values of y, but only 2 for x=2. So, we'll have to fix that first with NA for x=2, y=3.

setkey(A, x, y)
A[CJ(unique(x), unique(y))]

Now, to get it to wide format, we should group by "x" and use as.list on v as follows:

out <- A[CJ(unique(x), unique(y))][, as.list(v), by=x]
   x  V1  V2  V3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

Now, you can set the names of the reshaped columns using reference with setnames as follows:

setnames(out, c("x", as.character(unique(A$y)))

   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA
Arun
  • 116,683
  • 26
  • 284
  • 387
  • Actually your first solution is not given in your post. So I will keep it for academic reasons! Thanks a lot for the effort! – Raffael Aug 04 '13 at 21:53
  • The first solution works when there are already equal rows for each group. And it's better to set the names of columns once at the end rather than setting it for every group (which is what the old solution does). There may be performance issues due to that on bigger data. – Arun Aug 04 '13 at 21:54
  • 1
    Okay, this solution I would have found after studying the manual for weeks I guess ... so, this might be very subjective but I think I can say with some right that this solution is quite complex and elaborate - no doubt though about that it is the most simple solution possible just using data.table as I asked for. So, I would like to know from you whether maybe this area of reshaping data sets is just not what data.table is actually supposed to address. Where would you draw the line regarding an economic and sensible usage of data.table and where to resort to further tools? – Raffael Aug 05 '13 at 10:02
  • 2
    There are plans to add `melt` and `cast` to `data.table`. Look at [**FR #2627**](http://r-forge.r-project.org/tracker/index.php?func=detail&aid=2627&group_id=240&atid=978). `data.table` is still evolving. There are a number of features that may still be discovered. So, I wouldn't restrict myself to the view that it does this, this and this. If you think there's a very common operation or useful operation that isn't straightforward, add in a feature request. MatthewDowle is pretty open to ideas. Also check out the data.table mailing list. – Arun Aug 05 '13 at 10:11
15

Use dcast() (now a default data.table method, from version 1.9.5; earlier versions use dcast.data.table) as in

> dcast(A,x~y)
Using 'v' as value column. Use 'value.var' to override
   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

This is fast and obviates the need to setnames().

It is also especially helpful when y in the above example is a factor variable with character levels -- e.g. 'Low', 'Medium', 'High' -- because CJ() may not return the wide data with variables in the order that setnames() expects, and you can end up with your data mislabeled badly.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Gabi
  • 1,303
  • 16
  • 20
2

(with credits to Arun)

A[, setattr(as.list(v), 'names', y), by=x]
Raffael
  • 19,547
  • 15
  • 82
  • 160
  • 1
    Error in `[.data.table`(A, , setattr(as.list(v), "names", y), by = x) : j doesn't evaluate to the same number of columns for each group – skan Oct 16 '15 at 09:09