Starting with 2 objects: 1 data frame of order attributes - Order Numbers, Weights and Volumes, and 1 list - combination strings of Order Numbers.
attr <- data.frame(Order.No = c(111,222,333), Weight = c(20,75,50), Volume = c(10,30,25))
combn <- list(111, 222, 333, c(111,222), c(111,333), c(222,333), c(111,222,333))
The objective is to find the total weight and cube for each string of orders, and keep only the combinations that are within both the weight and cube constraints.
I'm currently using the following -
# Lookup weights for each Order.No in the attr table
# Add up total weight for the combination and keep it if it's in the range
wgts <- lapply(combn, function(x) {
temp <- attr$Weight[match(x, attr$Order.No)]
temp <- sum(temp)
temp[temp <= 50 & temp >= 20]
})
> wgts
[[1]]
[1] 20
[[2]]
numeric(0)
[[3]]
[1] 50
[[4]]
numeric(0)
[[5]]
numeric(0)
[[6]]
numeric(0)
[[7]]
numeric(0)
# Lookup volumes for each Order.No in the attr table
# Add up total volume for the combination and keep it if it's in the range
vols <- lapply(combn, function(x) {
temp <- attr$Volume[match(x, attr$Order.No)]
temp <- sum(temp)
temp[temp <= 50 & temp >= 10]
})
> vols
[[1]]
[1] 10
[[2]]
[1] 30
[[3]]
[1] 25
[[4]]
[1] 40
[[5]]
[1] 35
[[6]]
numeric(0)
[[7]]
numeric(0)
Then use mapply to merge the two lists of weights and volumes.
# Find and keep only the rows that have both the weights and volumes within their ranges
which(lapply(mapply(c, wgts, vols), function(x) length(x)) == 2)
# Yields position 1 and 3 which meet the subsetting conditions
> value value
1 3
The code above looks up the individual order weights and cubes, sums them all together, checks to make sure they are within each range limit, merges both lists together and keeps only those that have both the weight and cubes within the acceptable ranges.
My current solution, which successfully completes the task, is terribly slow on production volume and does not scale well with millions of records. With 11 MM order combinations to lookup, this process takes ~40 minutes to run, which is unacceptable.
I'm seeking a more efficient method that will drastically reduce the run-time required to produce the same output.