1

All-

I am needing to change from a long format to a wide format in R, but I need the column values to be 1 or zero depending on if the particular variable is present for the subject.

The input data looks like:

Subject Product
    1   ProdA
    1   ProdB
    1   ProdC
    2   ProdB
    2   ProdC
    2   ProdD
    3   ProdA
    3   ProdB

and I want it to be

Subject ProdA   ProdB   ProdC   ProdD
    1   1   1   1   0
    2   0   1   1   1
    3   1   1   0   0

Is there any way in R to accomplish this?

EDIT:

One way I think is to first table the data:

tbl<-data.frame(table(data))

Then apply

final <- cast(tbl, Subject~Product, max)

I wonder if there is a more efficient way?

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
B_Miner
  • 1,840
  • 4
  • 31
  • 66

1 Answers1

2
xtabs(data=dat)
       Product
Subject ProdA ProdB ProdC ProdD
      1     1     1     1     0
      2     0     1     1     1
      3     1     1     0     0

A slightly more readable version would make the fiormula explicit:

xtabs( ~Subject+Product, data=dat)

If you want to go with stats::reshape, then try this:

reshape(dat, idvar="Subject", timevar=2,   v.names="Product", direction="wide")
  Subject Product.ProdA Product.ProdB Product.ProdC Product.ProdD
1       1         ProdA         ProdB         ProdC          <NA>
4       2          <NA>         ProdB         ProdC         ProdD
7       3         ProdA         ProdB          <NA>          <NA>

(But it does not return numbers.)

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Do you happen to know if xtabs is efficient - compared to say reshape() approaches? The table I have is around 900,000 rows (413,000 distinct subjects and 50 distinct products) and on 18 gig of RAM (64 bit Windows 7), the job fails due to insufficient memory. – B_Miner Sep 24 '11 at 00:20
  • It doesn't seem as thought that should be enough to exhaust that supply of RAM. Exactly what is the error message? – IRTFM Sep 24 '11 at 01:00
  • Its odd, I got the error running in RStudio. When I run the same in Revolution Analytics Enterprise, there is absolutely no issue and it runs fast. – B_Miner Sep 24 '11 at 01:10