0

I am trying to write a function that will sum the column(s) in the data frame according to the values in the first two columns.For example I have a matrix M,

Crs gr  P_7 P_8      
38  1   3   16
38  1   12  45
38  1   9   28
40  2   3   9
40  2   14  29
40  1   4   3
40  2   8   2

I want to sum the columns according to column1(crs) first and then column2(gr). Result will be,

    Crs gr  P_7  P_8      
    38  1   24  89
    40  2   25  40
    40  1   4   3

Currently I am using,

M <- M[, list(sum(P_7),sum(P_8)), by=list(Crs,gr)]

But the problem with this, is that I have to define the names of columns which wont be fixed. So, I was wondering how can I do this without defining the names of the columns. Thanks in advance!

jsin
  • 77
  • 5
  • 1
    No issues with Andrie's answer, but you've asked a `data.table` question and there's a much more efficient way to do this (see @eddi's) than using `plyr`. – Arun Jul 01 '13 at 15:10
  • @Arun Computationally faster, yes, I agree. But efficiency is also a function of how comfortable you are with a framework. I find I'm much more efficient at writing `plyr` solutions, despite having worked with `data.table` extensively. – Andrie Jul 01 '13 at 15:12
  • @Andrie, The only reason I emphasised eddi's answer here is because the OP has already showed his `data.table` attempt at getting the answer. However, I am a fan of plyr and don't have anything against it or your answer. As long as one doesn't work with huge data( which I do as a bioinformatician), one doesn't need to compromise the brevity of plyr (although I've to say I find data.table syntax very much straightforward, personally). – Arun Jul 01 '13 at 15:23
  • "brevity of plyr"?? ime most of the time plyr expressions are usually much more complicated and difficult to understand, and in this case it's only shorter by 1 character (and only because I was explicit about "by"). I started with plyr before I encountered data.table, and was very quickly turned off by the very steep learning curve and unclear syntax (and that's when, looking for alternatives I found data.table). – eddi Jul 01 '13 at 15:31
  • @eddi, people have different opinions about different packages (products). Many people have had problems (including you) with the syntax of `data.table` (no offence to the package or Matthew). All I'm saying is it's subjective. It's best to respect other's differences and not be too critical. `plyr` is liked by many, many (regular) R users (`data.table` can only replace 1 or some of `plyr`'s functions yet) and is popular for many reasons. – Arun Jul 02 '13 at 06:33
  • @Arun my reaction was mostly about "brevity" which is objectively measurable and which is a dimension along which I think plyr loses to data.table in majority of cases where both are applicable. – eddi Jul 02 '13 at 12:38
  • @eddi, unless or otherwise you compare both `plyr`'s `ddply` and `data.table`, at least for the most common operations and count the characters (and plot it), (and also measure the complicatedness and the difficulty in understanding you talk about in relation to brevity), it's safe to assume, for me, that it's subjective. – Arun Jul 02 '13 at 13:27
  • @Arun fair enough, you want data that I'm too lazy to analyze (even though it exists right here on SO) - that would be a fun project to do :) – eddi Jul 02 '13 at 13:29

2 Answers2

5

The package plyr has some magic for situations just like this. Use a combination of ddply and numcolwise, like this:

library(plyr)
ddply(dat, .(Crs, gr), numcolwise(sum))

results in:

  Crs gr P_7 P_8
1  38  1  24  89
2  40  1   4   3
3  40  2  25  40
Andrie
  • 176,377
  • 47
  • 447
  • 496
5

You're looking for this:

M[, lapply(.SD, sum), by = list(Crs, gr)]
eddi
  • 49,088
  • 6
  • 104
  • 155