0

I am reading a csv file with customer buys for each week over a movie's viewing window on the video on demand. The file consiste of several columns for a given movie, including title, price, week numer, and number of buys in that occur in that week. Sometimes, a specific movie may have the buys for a given week in multiple rows (for the same week).

I am trying to manipulate the data so that if there are multiole rows for the same week for a given movie, i want to replace it with one row for the week and the buys added up.

I have just started to look at R last few weeks and have failed miserabley in tis attempt. Please help!!!

user597551
  • 979
  • 2
  • 9
  • 9
  • 3
    Please, add some sample data and a bit more clear problem description. For tips, see : http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Joris Meys May 23 '11 at 16:22

3 Answers3

3

I think I followed what you wanted. Let me know if this is what you had in mind.

First, make up some data:

movies <- data.frame(mov = c("Foo", rep(c("Foo", "Bar", "Baz"),3))
    , price = c(7, rep(c(7, 5, 2),3))
    , wknumb = c(1, rep(1:3, each = 3))
    , buys = sample(10:20, 10, TRUE))

   mov price wknumb buys
1  Foo     7      1   12
2  Foo     7      1   11
3  Bar     5      1   10
4  Baz     2      1   19
5  Foo     7      2   10
6  Bar     5      2   15
7  Baz     2      2   15
8  Foo     7      3   14
9  Bar     5      3   15
10 Baz     2      3   20

We have two entries for movie "Foo" and week 1 which we need to collapse into a single entry. We can use the plyr package for this:

library(plyr)
ddply(movies, c("mov", "wknumb"), summarize, totbuys = sum(buys))

Which returns a data.frame with 9 rows, one for each movie and week:

  mov wknumb totbuys
1 Bar      1      10
2 Bar      2      15
3 Bar      3      15
4 Baz      1      19
5 Baz      2      15
6 Baz      3      20
7 Foo      1      23
8 Foo      2      10
9 Foo      3      14
Chase
  • 67,710
  • 18
  • 144
  • 161
  • Hi Chase, thanks for your help. I tried it but i got an error message i do not understand. I have the code below using my files column names: df1=read.csv(myFile,header=TRUE) > #df2 <- data.frame("ID"=paste(df1$ROOT_TITLE,"WkNum"=df1$WEEK_OF_View_Window), "title"=df1$ROOT_TITLE,"WkNum"=df1$WEEK_OF_View_Window ,"Buys"=df1$Number.of.Buys) > df2 <- data.frame("title"=df1$ROOT_TITLE,"WkNum"=df1$WEEK_OF_View_Window ,"Buys"=df1$Number.of.Buys) > > library(plyr) > ddply(df2,c("title", "WkNum"),summarize, totbuys=sum(Buys)) Error in Summary.factor(c(571L, 773L), na.rm = FALSE) : sum not meaning – user597551 May 25 '11 at 13:44
  • That error doesn't make much sense to me either. It could be referencing the fact that "Buys" is a factor and not numeric. You can check the structure of your data with `str(df2)` to see the mode of each column. Otherwise, update your question with the output of `dput(df2)` or a subset of it, and I'll take another look. – Chase May 25 '11 at 13:56
2

A variation using aggregate from base R (and Chase's dataset).

with(movies, aggregate(buys, list(mov = mov, wknumb = wknumb), sum))
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
0

Many of these can be handled by sqldf if you are familiar with sql.

library(sqldf)
sqldf("select mov, wknumb, sum(buys) as sumBuys from movies group  by wknumb, mov")
Chase
  • 67,710
  • 18
  • 144
  • 161
Bala
  • 1
  • Thank you folks, for the various answers. I struggled all day long with aggregate & tapply with no success. But you folks have made it look so simple. Thanks a Ton. – user597551 May 23 '11 at 21:29