I have two vectors. For each element of vector A, I would like to know all the elements of vector B that fulfill a certain condition. So, for example, two dataframes containing the vectors:
person <- data.frame(name = c("Albert", "Becca", "Celine", "Dagwood"),
tickets = c(20, 24, 16, 17))
prize <- data.frame(type = c("potato", "lollipop", "yo-yo", "stickyhand",
"moodring", "figurine", "whistle", "saxophone"),
cost = c(6, 11, 13, 17, 21, 23, 25, 30))
For this example, each person in the "person" dataframe has a number of tickets from a carnival game, and each prize in the "prize" dataframe has a cost. But I'm not looking for perfect matches; instead of simply buying a prize, they randomly receive any prize that is within a 5-ticket cost tolerance of what they have.
The output I'm looking for is a dataframe of all the possible prizes each person could win. It would be something like:
person prize
1 Albert stickyhand
2 Albert moodring
3 Albert figurine
4 Albert whistle
5 Becca moodring
6 Becca figurine
... ...
And so on. Right now, I'm doing this with lapply()
, but this is really no faster than a for()
loop in R.
library(dplyr)
matching_Function <- function(person, prize, tolerance = 5){
matchlist <- lapply(split(person, list(person$name)),
function(x) filter(prize, abs(x$tickets-cost)<=tolerance)$type)
longlist <- data.frame("person" = rep(names(matchlist),
times = unlist(lapply(matchlist, length))),
"prize" = unname(unlist(matchlist))
)
return(longlist)
}
matching_Function(person, prize)
My actual datasets are much larger (in the hundreds of thousands), and my matching conditions are more complicated (checking coordinates from B to see whether they are within a set radius of coordinates from A), so this is taking forever (several hours).
Are there any smarter ways than for()
and lapply()
to solve this?