0

I have a large dataframe in RStudio (15,000 rows, 300 columns) and its a mess. It looks somewhat like this:

ID Exam1 Exam2 Exam3..... Exam299
1 75 76 99 100
2 25 25 25 25
2 22 20 22 22
2 25 25 20 22
2 20 20 25 23
3 79 88 92 96

For each individual student ID I want to add all the individual columns so each student only has 1 row associated with him/her. It should look like this:

ID Exam1 Exam2 Exam3 Exam299
1 75 76 99 100
2 92 90 92 92
3 79 88 92 96

Everything I've tried sums only one column at a time and/or combines entries without summing them:

aggregate(ID~Exam1, data=df, c)
M--
  • 25,431
  • 8
  • 61
  • 93
iPlexpen
  • 409
  • 1
  • 4
  • 13
  • I think this may be amenable to group_by() and summarise(), but it would help if you provided a reproducible example, e.g. give us the code to recreate the data frame. – Joy May 17 '17 at 19:25
  • 3
    While the duplicate question says "with dplyr" in the title, it's worth noting that it offers a number of solutions in the answers. – David Robinson May 17 '17 at 19:28
  • @DavidRobinson Yeah, That's the post that I read when I wanted to do the same. Thanks. – M-- May 17 '17 at 19:33
  • @DavidRobinson I tried this with dplyr: df %>% group_by(ID) %>% summarize_each(funs(sum)) and it didn't do anything. The post you refer to uses means, not summation, so am I using the "sum" incorrectly? – iPlexpen May 17 '17 at 19:39
  • When you say it didn't do anything, what do you mean? Did you assign the result to a new variable (e.g. `df2 <- df %>% group_by( etc` so that `df2` contains the new output? (It won't change the original variable) – David Robinson May 17 '17 at 19:40
  • @DavidRobinson, I've tried it with df2 and the data is unchanged. It looks exactly the same. Masoud's data.table worked. I don't know why the aggregate and the dplyr techniques aren't working for me. – iPlexpen May 17 '17 at 19:50
  • @Quixotic try this `df <- data.frame(df)` and then apply the other solutions (i.e. ddply and aggregate). Just in case. – M-- May 17 '17 at 19:53

1 Answers1

0

You can use this:

df.sum <- aggregate(. ~ ID, data=df, FUN=sum)

You can also use data.table library:

require(data.table)
dt <- data.table(df)
dt.sum <- dt[, lapply(.SD, sum), by = ID]

I think you can also use dplyr package too for this, but don't have the solution off the top of my head.

M--
  • 25,431
  • 8
  • 61
  • 93
  • The data.table method worked! Thank you. The aggregate command still isn't working though, I get the following error: Error in aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate – iPlexpen May 17 '17 at 19:49
  • @Quixotic If you read [How to make a reproducible example in r?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and provide a minimal example then I can help you to debug the code. – M-- May 17 '17 at 19:51