2

I have a data frame with a sequence of numeric columns, surrounded on both sides by (irrelevant) columns of characters. I want to obtain a new data frame that keeps the position of the irrelevant columns, and adds the numeric columns to eachother by a certain grouping vector (or applies some other row-wise function to the data frame, by group). Example:

sample = data.frame(cha1 = c("A","B"),num1=1:2,num2=3:4,num3=11:12,num4=13:14,cha2=c("C","D"))
> sample
  cha1 num1 num2 num3 num4 cha2
1    A    1    3   11   13    C
2    B    2    4   12   14    D

with the goal to obtain

> goal
  cha1 X1 X2 cha2 
1    A  4 24    C
2    B  6 26    D

i.e. I've summed the 4 numeric columns according to the grouping vector gl(2,2,4) = (1,1,2,2) [levels: 1,2]

For a purely numeric data frame I've found the following method:

sample_num = sample[,2:5] #select numeric columns
data.frame(t(apply(sample_num,1,function(row) tapply(row, INDEX=gl(2,2,4),sum))))

I could combine this with re-inserting the character columns to give the intended result, but I'm really looking for a more elegant way. I'm particularly interested in a plyr method if there is one, as I'm trying to migrate to plyr for all my data frame manipulations. I imagine the first step would be to cast the data frame into long format, but I have no idea how to proceed from there.

One 'absolute' requirement is that I cannot do without the gl(n,k,l) method of grouping, as I need this to be applicable to a wide range of data frames and grouping factors.

EDIT: for simplicity assume that I know which columns are the relevant numeric columns. I'm not concerned with how to select them, I'm concerned with how to do my grouped sum without messing up the original data frame structure.

Thanks!

user28400
  • 217
  • 2
  • 8
  • I dont unterstand how the grouping works? where can your grouping vector be found? please update your sample data including the grouping vector – grrgrrbla Jun 11 '15 at 10:13
  • in this case the grouping vector i (arbitrarily) chose is `(1,1,2,2)` i.e. I want to add the first two columns and the second two columns. For four columns it can be any factor of length 4 with up to 4 levels, e.g. (1,1,1,2) or (1,2,3,3). – user28400 Jun 11 '15 at 10:15
  • 1
    please include that in your OP, you are talking about selecting the columns and not grouping, your choice of words here is very confusing – grrgrrbla Jun 11 '15 at 10:17
  • You could do something like `GrpIndx <- gl(2,2,4); Numcols <- 2:5 ; cbind(sample[-Numcols], t(rowsum(t(sample[Numcols]), paste0("X", GrpIndx))))` though I'm still not sure how do you decide which column is numeric or not. – David Arenburg Jun 11 '15 at 10:42
  • the example grouping vector that I used can clearly be found in the OP @grrgrrbla . Not sure what your gripe is here. I need to select adjacent numeric columns and then add them according to the grouping vector. – user28400 Jun 11 '15 at 11:06
  • @DavidArenburg I added a comment to the OP – user28400 Jun 11 '15 at 11:09
  • Didn't my solution work for you? – David Arenburg Jun 11 '15 at 11:09
  • It gets the desired arithmetic output but reshuffles the column order. Thanks for the suggestion though, it is already much closer to what I want. – user28400 Jun 11 '15 at 11:13

1 Answers1

-1
Grpindex<-gl(2,2,4)    
goal<-cbind.data.frame(sample["cha1"],(t(rowsum(t(sample[,2:5]), paste0("X",Grpindex)))),sample["cha2"])

Output:

  cha1 X1 X2 cha2
1    A  4 24    C
2    B  6 26    D
Nick Fisch
  • 24
  • 7