3

How can I reshape a data.table (long into wide) without doing a function like sum or mean? I was looking at dcast/melt/reshape/etc. But I don't get the desired results.

This is my data:

DT <- data.table(id = c("1","1","2","3"), score = c("5", "4", "5", "6"))

Original format:

> DT
id score
1  5 
1  4 
2  5 
3  6 

Desired format:

id score1 score2
1  5      4
2  5      NA
3  6      NA 

I now do the trick with:

DT <- DT[, list(list(score)), by=id]

But then the contents of the first cell is like:

c("5", "4")

And I need to split it (I use the package splitstackshape):

DT <- cSplit(DT, "V1", ",")

This is probably not the most efficient method... What is a better way?

Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
peter
  • 93
  • 6

1 Answers1

4

You can use getanID to create a unique .id for the grouping variable id. Then, try with dcast.data.table (or simply dcast from versions 1.9.5 and beyond) and if needed change the column names using setnames

 library(splitstackshape)
 res <- dcast(getanID(DT, 'id'), id~.id,value.var='score')
 setnames(res, 2:3, paste0('score', 1:2))[]
 #    id score1 score2
 #1:  1      5      4
 #2:  2      5     NA
 #3:  3      6     NA

Or using only data.table

 dcast(DT[, .id:=paste0('score', 1:.N), by=id],
       id~.id, value.var='score')
 #   id score1 score2
 #1:  1      5      4
 #2:  2      5     NA
 #3:  3      6     NA

Or from the code you were using (less number of characters)

cSplit(DT[, toString(score), by=id], 'V1', ',')
#   id V1_1 V1_2
#1:  1    5    4
#2:  2    5   NA
#3:  3    6   NA
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thx a million! I checked which one was the fastest. The cSplit uses less characters, but takes almost double the time needed. – peter Dec 07 '14 at 21:17
  • @peter, What size data are you testing on? On anything of a reasonable size, I would imagine that the `getanID` approach should be reasonably fast (It uses `sequence(.N)` which I'd *guess* would be faster than `paste`). `cSplit` would definitely not be the best choice here though. +1 akrun. – A5C1D2H2I1M1N2O1R2T1 Dec 08 '14 at 15:29
  • @Anando, The getanID and the data.table methods are very fast. cSplit is slower. My tables are 300k rows. The data.table method takes 0.74 sec on my machine The cSplit method 2,7 sec. – peter Dec 08 '14 at 19:07
  • 1
    @akrun I was searching for this from that time. So i referred this. And thanks for a very nice answer! – Veerendra Gadekar Jul 29 '15 at 12:49
  • 1
    @VeerendraGadekar Actually, I didn't imply that you got the inspiration from here. I really forgot that I used it this way. I was just teasing you about the story behind your thought process :-) – akrun Jul 29 '15 at 12:51
  • 1
    @akrun I liked this approach very much! and bookmarked it. This is so cool! :) – Veerendra Gadekar Jul 29 '15 at 12:54
  • 1
    @VeerendraGadekar Thanks for your generous comments. It made my day! – akrun Jul 29 '15 at 12:57
  • Akrun, can you help in this related question, please? https://stackoverflow.com/questions/41163500/r-transform-from-wide-to-long-without-sorting-columns/41163691#41163691 the answers provided there don't work as desired. – skan Jan 31 '17 at 09:50