R: Sum one row array if another row array satisfies conditions, like excel, but in R

Question

Say, for example, I have data on 5 people A:E. These people provide information on what quantity they are willing to buy at what price in four bands. The data are in a wide format.

df = cbind.data.frame(Q1 = c(90,50,20,10,10), Q2 = c(110,0,0,0,0),
                      Q3 = c(60,60,50,20,5), Q4 = c(20,10,0,0,0), 
                      P1 = 2:6, P2 = c(3,6,8,9,10),
                      P3 = c(2,3,5,7,9), P4 = 1:5)
row.names(df) = LETTERS[1:5]

(My actual data set has many observations on individuals over many time periods, and with many more bands).

What I wish to do is to sum for each individual, the quantity they are willing to buy within a particular price category.

Say that I want to sum all the quantities an individual is willing to consume at any price within $0 and $5, and similarly for a category of prices between $5 and $10. Using excel I would use a simple 'sumifs' to do such.

How might I do this in R?

The output I would expect from the data above would be:

  0<P<=5   5<P<=10 
A 280      0 
B 120      0 
C 70       0
D 10       20 
E 0        15

I typically have been using data.table of late, so it would be good if someone knew of a solution using this package. I have also tried doing this by reshaping the data, but it becomes too large to deal with (warning messages etc.) so it needs to remain in this format.

Thanks!

I am not sure the expected output is correct. How did you get 170 for A? The P value column in the first row are all <=5 and >0. In that case, 280 would be the sum. — akrun, Aug 11 '15 at 23:46
Anyway, I have to go now. If the expected output is incorrect `dfQ <- df[grep('^Q', names(df))];dfP <- df[grep('^P', names(df))];Sum0_5 <- rowSums(dfQ*(dfP > 0 & dfP <=5)); Sum5_10 <- rowSums(dfQ*(dfP > 5 & dfP <=10))` — akrun, Aug 11 '15 at 23:54

score 5 · Answer 1 · edited May 23 '17 at 10:29

I reproduced your dataset, which should look like this:

df = cbind.data.frame(Q1 = c(90,50,20,10,10), Q2 = c(110,0,0,0,0),
                      Q3 = c(60,60,50,20,5), Q4 = c(20,10,0,0,0), 
                      P1 = 2:6, P2 = c(3,6,8,9,10),
                      P3 = c(2,3,5,7,9), P4 = 1:5)
row.names(df) = LETTERS[1:5]

please always share your data in an easy-to-use way, see here for ideas.

Now you don't need data.table for this (but there certainly will be a data.table way to do this), I'll show you a base R solution here:

df.q = df[,1:4]
df.p = df[,5:8]

I split your dataframe into prices and demand here. Since they have the same format, you can num sum over the indices where your conditions are true: EDIT: After updating your question this is the base R solution:

sapply(1:5, function(x) sum(cbind(0, df.q[x, df.p[x,] <= 5])))
sapply(1:5, function(x) sum(cbind(0, df.q[x, df.p[x,] > 5 & df.p[x,] <= 10])))

The result is:

> df.q
  Q1  Q2 Q3 Q4
A 90 110 60 20
B 50   0 60 10
C 20   0 50  0
D 10   0 20  0
E 10   0  5  0
> df.p
  P1 P2 P3 P4
A  2  3  2  1
B  3  6  3  2
C  4  8  5  3
D  5  9  7  4
E  6 10  9  5
> sapply(1:5, function(x) sum(cbind(0, df.q[x, df.p[x,] <= 5])))
[1] 280 120  70  10   0
> sapply(1:5, function(x) sum(cbind(0, df.q[x, df.p[x,] > 5 & df.p[x,] <= 10])))
[1]  0  0  0 20 15

akrun · Accepted Answer · 2015-08-12T11:29:06.080

3

We subset the columns beginning with 'Q' and the columns with 'P' separately to create two datasets ('dfQ', 'dfP'). Then, we get a logical matrices ((dfP > 0 & dfP <=5), (dfP > 5 & dfP <=10)), and multiply (*) it with the 'dfQ' dataset. The TRUE/FALSE values in the logical matrix will be coerced to 1/0 integer values so that 'dfQ' elements that correspond to FALSE/0 will be 0 after the multiplication whereas those correspond to TRUE/1 elements in the logical matrix will remain same. Then, we do rowSums to get the expected output.

 dfQ <- df[grep('^Q', names(df))]
 dfP <- df[grep('^P', names(df))]
 Sum0_5 <- rowSums(dfQ*(dfP > 0 & dfP <=5))
 Sum5_10 <- rowSums(dfQ*(dfP > 5 & dfP <=10))
 cbind(Sum0_5, Sum5_10)
 #  Sum0_5 Sum5_10
 #A    280       0
 #B    120       0
 #C     70       0
 #D     10      20
 #E      0      15

edited Aug 12 '15 at 11:29

answered Aug 12 '15 at 11:23

akrun

874,273
37
540
662

1

nice solution, I over-complicated mine when the question was updated to show the expected output. +1 – mts Aug 12 '15 at 14:53
@mts Thanks, I was thinking about leaving it as a comment as this was put on hold yesterday. Today I posted the solution as it will give some options. – akrun Aug 12 '15 at 14:55
1

This is awesome. Thanks akrun. May I ask what the purpose of the "^" is in the grep command? – Bucket Aug 13 '15 at 22:46
@LyssBucks The "^" is a special symbol to denote the beginning of the string. So we are matching 'Q' or 'P' at the beginning of the string instead of those letters at the middle or anywhere in a word. – akrun Aug 14 '15 at 00:24

R: Sum one row array if another row array satisfies conditions, like excel, but in R

2 Answers2