5

I have a data frame:

Y  X1  X2  X3
1   1   0  1
1   0   1  1
0   1   0  1
0   0   0  1
1   1   1  0
0   1   1  0

I want sum over all rows in Y column based on other columns that equal to 1, which is sum(Y=1|Xi =1). For example, for column X1, s1 = sum(Y=1|Xi =1) =1 + 0 +1+0 =2

Y  X1   
1   1   

0   1    

1   1    
0   1   

For X2 column, the s2 = sum(Y=1|Xi =1) = 0 +1+0 =1

    Y   X2  

    0   1   

    1   1    
    0   1     

For X3 column, the s3 = sum(Y=1|Xi =1) = 1+1 +0+0 =2

    Y    X3
    1    1
    1    1
    0    1
    0    1

I have a rough idea to use apply(df, 2, sum) for the column of the dataframe, but I have no idea how to subset each column based on Xi, then calculate the sum of Y. Any help is appreciated!

M--
  • 25,431
  • 8
  • 61
  • 93
Jassy.W
  • 539
  • 2
  • 9
  • 16
  • Are you ok doing this manually per column, or do you want it automatically done for a whole bunch of columns? –  Mar 27 '17 at 21:16
  • I want it altomatically done for a whole bunch of columns – Jassy.W Mar 27 '17 at 21:17
  • 2
    Fyi, you might want to `dput` your data next time, for easier reproducibility for your answerers. Some guidance: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank Mar 27 '17 at 21:49

3 Answers3

6

There are numerous ways to do this. One is getting a subset based on the column you want:

sum(df[df$X1==1,]$Y)

This should work for you.

M--
  • 25,431
  • 8
  • 61
  • 93
4

You can use colSums and count when Y*X is equal to 1. I think there's an error in your desired output for X2 column. Row 2 and 5 contain 1 for Y and X2. The sum should be 2.

x=read.table(text="Y  X1  X2  X3
1   1   0  1
1   0   1  1
0   1   0  1
0   0   0  1
1   1   1  0
0   1   1  0",header=TRUE, stringsAsFactors=FALSE)

colSums(x[,-1]*x[,1])

X1 X2 X3 
 2  2  2

You can also use crossprod(x[,1],as.matrix(x[,-1]))

     X1 X2 X3
[1,]  2  2  2
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
2

Here's one more approach that you could modify to sum elements corresponding to 1, 0, or some other value.

sapply(x[,-1], function(a) sum(x$Y[a == 1]))
#X1 X2 X3 
# 2  2  2 
d.b
  • 32,245
  • 6
  • 36
  • 77