product_info$total_qty <- aggregate(col~row,which(outer(product_info$productId,transact_data$productId,`==`)&outer(product_info$beg_date,transact_data$date,`<=`)&outer(product_info$end_date,transact_data$date,`>=`),arr.ind=T),function(x) sum(transact_data$qty[x]))$col;
product_info;
## productId old_price new_price beg_date end_date total_qty
## 1 A 0.50 0.70 2014-05-01 2014-05-31 25
## 2 B 0.10 0.11 2014-06-01 2014-06-31 20
## 3 A 0.11 0.12 2014-05-01 2014-05-31 25
## 4 C 0.12 0.11 2014-06-01 2014-06-31 9
## 5 C 0.30 0.20 2014-05-01 2014-05-31 2
## 6 B 0.40 0.30 2014-06-01 2014-06-31 20
Explanation
First, a logical matrix is constructed for each of the three match criteria, using outer()
to compare every record in product_info
with every record in transact_data
. These three logical matrices are logical-ANDed together to form a final logical matrix representing which combinations of records match.
outer(product_info$productId,transact_data$productId,`==`)
&outer(product_info$beg_date,transact_data$date,`<=`)
&outer(product_info$end_date,transact_data$date,`>=`)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [3,] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
Then, the row and column indexes with TRUE
are ascertained via a call to which()
with arr.ind=T
. Row indexes represent the matching records from product_info
(since it was on the left of the outer()
calls), and column indexes represent the matching records from transact_data
.
which(...,arr.ind=T)
## row col
## [1,] 1 1
## [2,] 3 1
## [3,] 2 2
## [4,] 6 2
## [5,] 1 5
## [6,] 3 5
## [7,] 5 7
## [8,] 2 8
## [9,] 6 8
## [10,] 4 10
Since we want to sum qty
values from transact_data
for each record in product_info
, we can aggregate()
the col
indexes grouping by row
by writing a custom aggregation function to index transact_data$qty
with the col
indexes and sum()
them to return a single value for each row
.
aggregate(col~row,...,function(x) sum(transact_data$qty[x]))
## row col
## 1 1 25
## 2 2 20
## 3 3 25
## 4 4 9
## 5 5 2
## 6 6 20
Finally, we can assign the result directly to product_info$total_qty
to complete the solution.
product_info$total_qty <- ...$col;
I'm not entirely sure if it is a guarantee that aggregate()
will always return its result ordered by the grouping column(s). I just asked this at Does aggregate() guarantee that the result will be ordered by the grouping columns?.
Also, I just realized that direct assignment will fail if not all records in product_info
had at least one matching record in transact_data
.
If either of those assumptions are violated, the solution can be fixed as follows:
product_info$total_qty <- with(aggregate(col~row,which(outer(product_info$productId,transact_data$productId,`==`)&outer(product_info$beg_date,transact_data$date,`<=`)&outer(product_info$end_date,transact_data$date,`>=`),arr.ind=T),function(x) sum(transact_data$qty[x])),col[match(1:nrow(product_info),row)]);
product_info;
## productId old_price new_price beg_date end_date total_qty
## 1 A 0.50 0.70 2014-05-01 2014-05-31 25
## 2 B 0.10 0.11 2014-06-01 2014-06-31 20
## 3 A 0.11 0.12 2014-05-01 2014-05-31 25
## 4 C 0.12 0.11 2014-06-01 2014-06-31 9
## 5 C 0.30 0.20 2014-05-01 2014-05-31 2
## 6 B 0.40 0.30 2014-06-01 2014-06-31 20
Now, instead of the final step of dereferencing $col
, we must construct a complete vector of length equal to the number of rows in product_info
, and match()
the qty
sums (which are inside col
) to their corresponding indexes (inside row
), with a little help from with()
.
product_info$total_qty <- with(...,col[match(1:nrow(product_info),row)]);