0

I have some data which looks as follows

"ID","PROD"
"1001658",6619
"100288",11843
"100288",20106
"1004303",921

I need to convert it into a format like

"ID","PROD_6619","PROD_11843","PROD_20106","PROD_921"
"1001658",1,0,0,0
"100288",0,1,1,0
"1004303",0,0,0,1

Basically where each value in the column PROD from the original data set is in a separate column of it's own. Note that the above dataset is only a sample and I cannot hard code to be "PROD_6619","PROD_11843","PROD_20106","PROD_921". It could be much more.

I have tried writing this iteratively using a for loop and it's very slow for my huge data set.

Can you suggest me an alternative in R

Rorschach
  • 31,301
  • 5
  • 78
  • 129
saltmangotree
  • 171
  • 4
  • 11

2 Answers2

6

You can just use table for something like this.

Example:

table(mydf)
##          PROD
## ID        921 6619 11843 20106
##   100288    0    0     1     1
##   1001658   0    1     0     0
##   1004303   1    0     0     0
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
0

Here is another approach with the dcast of the reshape2 package.

library(reshape2)

dcast(dat, ID ~ PROD, length )
Using PROD as value column: use value.var to override.
       ID 921 6619 11843 20106
1  100288   0    0     1     1
2 1001658   0    1     0     0
3 1004303   1    0     0     0
SabDeM
  • 7,050
  • 2
  • 25
  • 38