3

I have three different data frames like so:

V1.x<-c(1,2,3,4,5)
V2.x<-c(2,2,7,3,1)
V3.x<-c(2,4,3,2,9)
D1<-data.frame(ID=c("A","B","C","D","E"),V1.x=V1.x,V2.x=V2.x,V3.x=V3.x)

V1.y<-c(2,3,3,3,5)
V2.y<-c(1,2,3,3,5)
V3.y<-c(6,4,3,2,2)
D2<-data.frame(ID=c("A","B","C","D","E"),V1.y=V1.y,V2.y=V2.y,V3.y=V3.y)

V1<-c(3,2,4,4,5)
V2<-c(3,7,3,4,5)
V3<-c(5,4,3,6,3)
D3<-data.frame(ID=c("A","B","C","D","E"),V1=V1,V2=V2,V3=V3)

I would like to add up all of the V1 columns, all of the V2 columns, and all of the V3 columns

V1_Add<-D1$V1.x+D2$V1.y+D3$V1
V2_Add<-D1$V2.x+D2$V2.y+D3$V2
V3_Add<-D1$V3.x+D2$V3.y+D3$V3

Works just fine to get the individual column sums, but in the real data the column numbers go from V1:V80 so it would be great not to have to enter each collumn individually. Also, I would prefer to end up with one data frame that would contain all of the final sums like so:

  ID  V1  V2  V3
1  A  6  6   13
2  B  7  11  12
3  C  10 13  9
4  D  11 10  10
5  E  15 11  14
Vinterwoo
  • 3,843
  • 6
  • 36
  • 55

4 Answers4

2

Is this something like what you want?

D.Add <- data.frame(D1[,1],(D1[,-1]+D2[,-1]+D3[,-1]))
colnames(D.Add)<-colnames(D3)
Glen_b
  • 7,883
  • 2
  • 37
  • 48
  • That's really neat, Glen. I knew it would be possible with matrices, but seeing it done with dataframes was a surprise. – IRTFM Jun 08 '12 at 03:15
2
library(reshape2)
library(plyr)

# First let's standardize column names after ID so they become V1 through Vx. 
# I turned it into a function to make this easy to do for multiple data.frames
standardize_col_names <- function(df) {
# First column remains ID, then I name the remaining V1 through Vn-1 
# (since first column is taken up by the ID)
names(df) <- c("ID", paste("V",1:(dim(df)[2]-1),sep=""))
return(df)
}

D1 <- standardize_col_names(D1)
D2 <- standardize_col_names(D2)
D3 <- standardize_col_names(D3)

# Next, we melt the data and bind them into the same data.frame
# See one example with melt(D1, id.vars=1). I just used rbind to combine those
melted_data <- rbind(melt(D1, id.vars=1), melt(D2, id.vars=1), melt(D3, id.vars=1))
# note that the above step can be folded into the function as well. 
# Then you throw all the data.frames into a list and ldply through this function.

# Finally, we cast the data into what you need which is the sum of the columns
 dcast(melted_data, ID~variable, sum)
  ID V1 V2 V3
1  A  6  6 13
2  B  7 11 12
3  C 10 13  9
4  D 11 10 10
5  E 15 11 14



 # Combined everything above more efficiently :

   standardize_df <- function(df) {
    names(df) <- c("ID", paste("V",1:(dim(df)[2]-1),sep=""))
    return(melt(df, id.vars = 1))
    }

   all_my_data <- list(D1,D2,D3)
   melted_data <- ldply(all_my_data, standardize_df)
   summarized_data <- dcast(melted_data, ID~variable, sum)
Maiasaura
  • 32,226
  • 27
  • 104
  • 108
  • I like the less efficient answer as I'm not quite efficient at R myself and was a little easier to grasp :) Thanks for the answer! – Vinterwoo Jun 08 '12 at 01:55
  • I wrote the "less efficient" one so you could see all the pieces and how they work. I just combined them later on to show you how it would work together. Glad it was useful :) – Maiasaura Jun 08 '12 at 01:57
2

Here's an approach that is probably overkill, but should be pretty generalizable to any number of columns and any number of "index" columns as well. It does assume that all of your data.frames have the same number of columns and they are in the right order. First, create a list object out of all of your data.frames. I referenced this question to do that programatically.

ClassFilter <- function(x, class) inherits(get(x), "data.frame")
Objs <- Filter( ClassFilter, ls() )
Objs <- lapply(Objs, "get")

Next, I wrote a function to add all the numeric columns together by using Reduce and then stitch it back together with the non-numeric columns at the end:

FUN <- function(x){
  colsToProcess <- lapply(x, function(y) y[, unlist(sapply(y, is.numeric))])
  result <- Reduce("+", colsToProcess)
  #Get the non numeric columns
  nonNumericCols <- x[[1]]  
  nonNumericCols <- nonNumericCols[, !(unlist(sapply(nonNumericCols, is.numeric)))]
  return(data.frame(Index = nonNumericCols, result))
}

And finally, in action:

> FUN(Objs)
  Index V1.x V2.x V3.x
1     A    6    6   13
2     B    7   11   12
3     C   10   13    9
4     D   11   10   10
5     E   15   11   14
Community
  • 1
  • 1
Chase
  • 67,710
  • 18
  • 144
  • 161
0

what about this, just adding up the whole block? :

D1[,2:4] + D3[,2:4] + D2[,2:4]

... results in ...

  V1.x V2.x V3.x
1    6    6   13
2    7   11   12
3   10   13    9
4   11   10   10
5   15   11   14

it assumes, that all variables are in the same order, but otherwise should work well.

petermeissner
  • 12,234
  • 5
  • 63
  • 63