I am working on a project where I am analyzing individual-level survey data within countries based on outcomes of sports matches across countries and I am not sure what the most efficient way to produce the merge that I want is.
I am working on two separate datasets. One contains individual-level data nested within countries. The data might look something like this:
country <- c(rep("Country A", 4), rep("Country B", 6))
date <- c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", rep("2000-01-01", 2), "2000-01-02", rep("2000-01-03", 3))
outcome <- rnorm(10)
individual_data <- cbind.data.frame(country, date, outcome)
rm(country, date, outcome)
The other has country-match level data, which will look something like this:
date <- rep("2000-01-02", 2)
country <- c("Country A", "Country B")
opponent <- c("Country B", "Country A")
match_outcome <- c("L", "W")
match_data <- cbind.data.frame(date, country, opponent, match_outcome)
rm(date, country, opponent, match_outcome)
In this example, there's just one match, played on January 2nd, 2000, where country A lost to country B. I would like to perform a fuzzy_join
so that as opposed to this left_join
here, the match_data
matches up with the individual_data
even if the date isn't exact.
# incorrect
merged <- left_join(individual_data, match_data)
I would like to do this with a range of 3 days, and I would like an indicator of how many days it is before and after the match within this range. The final product would look something like this:
country <- c(rep("Country A", 4), rep("Country B", 6))
date <- c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", rep("2000-01-01", 2), "2000-01-02", rep("2000-01-03", 3))
outcome <- rnorm(10)
opponent <- c(rep("Country B", 4), rep("Country A", 6))
match_outcome <- c(rep("L", 4), rep("W", 6))
match_date <- rep("2000-01-02", 10)
difference <- c(-1, 0, 1, 2, -1, -1, 0, rep(1, 3))
desired_output <- cbind.data.frame(country, date, outcome, opponent, match_outcome, match_date, difference)
rm(country, date, outcome, opponent, match_outcome, match_date, difference)
Can anyone help me out? I've been really struggling with how to get this done. Here is what I've tried so far:
match_data$match_date_minus3 <- ymd(match_data$date) - days(3)
match_data$match_date_plus3 <- ymd(match_data$date) + days(3)
test_output <- fuzzy_left_join(individual_data, match_data,
by = c("country" = "country",
"match_date_minus3" = "date",
"match_date_plus3" = "date"),
match_fun = list("==", ">", "<"))
but I get the following error: Error in which(m) : argument to 'which' is not logical
For reference if anyone is aware, I'm trying to replicate the results of Depeteris-Chauvin et al. 2018.