How do I add values of certain column variables if they apprear in repeated rows with same key values

Question

I am reading a csv file with customer buys for each week over a movie's viewing window on the video on demand. The file consiste of several columns for a given movie, including title, price, week numer, and number of buys in that occur in that week. Sometimes, a specific movie may have the buys for a given week in multiple rows (for the same week).

I am trying to manipulate the data so that if there are multiole rows for the same week for a given movie, i want to replace it with one row for the week and the buys added up.

I have just started to look at R last few weeks and have failed miserabley in tis attempt. Please help!!!

Please, add some sample data and a bit more clear problem description. For tips, see : http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Joris Meys, May 23 '11 at 16:22

score 3 · Answer 1 · answered May 23 '11 at 16:45

I think I followed what you wanted. Let me know if this is what you had in mind.

First, make up some data:

movies <- data.frame(mov = c("Foo", rep(c("Foo", "Bar", "Baz"),3))
    , price = c(7, rep(c(7, 5, 2),3))
    , wknumb = c(1, rep(1:3, each = 3))
    , buys = sample(10:20, 10, TRUE))

   mov price wknumb buys
1  Foo     7      1   12
2  Foo     7      1   11
3  Bar     5      1   10
4  Baz     2      1   19
5  Foo     7      2   10
6  Bar     5      2   15
7  Baz     2      2   15
8  Foo     7      3   14
9  Bar     5      3   15
10 Baz     2      3   20

We have two entries for movie "Foo" and week 1 which we need to collapse into a single entry. We can use the plyr package for this:

library(plyr)
ddply(movies, c("mov", "wknumb"), summarize, totbuys = sum(buys))

Which returns a data.frame with 9 rows, one for each movie and week:

  mov wknumb totbuys
1 Bar      1      10
2 Bar      2      15
3 Bar      3      15
4 Baz      1      19
5 Baz      2      15
6 Baz      3      20
7 Foo      1      23
8 Foo      2      10
9 Foo      3      14

Hi Chase, thanks for your help. I tried it but i got an error message i do not understand. I have the code below using my files column names: df1=read.csv(myFile,header=TRUE) > #df2 <- data.frame("ID"=paste(df1$ROOT_TITLE,"WkNum"=df1$WEEK_OF_View_Window), "title"=df1$ROOT_TITLE,"WkNum"=df1$WEEK_OF_View_Window ,"Buys"=df1$Number.of.Buys) > df2 <- data.frame("title"=df1$ROOT_TITLE,"WkNum"=df1$WEEK_OF_View_Window ,"Buys"=df1$Number.of.Buys) > > library(plyr) > ddply(df2,c("title", "WkNum"),summarize, totbuys=sum(Buys)) Error in Summary.factor(c(571L, 773L), na.rm = FALSE) : sum not meaning — user597551, May 25 '11 at 13:44
That error doesn't make much sense to me either. It could be referencing the fact that "Buys" is a factor and not numeric. You can check the structure of your data with `str(df2)` to see the mode of each column. Otherwise, update your question with the output of `dput(df2)` or a subset of it, and I'll take another look. — Chase, May 25 '11 at 13:56

score 2 · Answer 2 · answered May 23 '11 at 17:34

2

A variation using aggregate from base R (and Chase's dataset).

with(movies, aggregate(buys, list(mov = mov, wknumb = wknumb), sum))

answered May 23 '11 at 17:34

Richie Cotton

118,240
47
247
360

score 0 · Answer 3 · edited May 23 '11 at 22:39

0

Many of these can be handled by sqldf if you are familiar with sql.

library(sqldf)
sqldf("select mov, wknumb, sum(buys) as sumBuys from movies group  by wknumb, mov")

edited May 23 '11 at 22:39

Chase

67,710
18
144
161

answered May 23 '11 at 19:50

Bala

1

Thank you folks, for the various answers. I struggled all day long with aggregate & tapply with no success. But you folks have made it look so simple. Thanks a Ton. – user597551 May 23 '11 at 21:29

How do I add values of certain column variables if they apprear in repeated rows with same key values

3 Answers3