Most of this is can be accomplished with a few merge statements. I'm using the tidyverse
suite of packages to do the work, but you can very easily do this in base R. I'll point out the changes - but the biggest will be the use of temporary variables or nesting instead of pipes. The pipe command %>%
is just going to call the next function in the chain with the previous result as the first argument.
library(tidyverse)
# generating your data
locations <- LETTERS[1:3]
n_locations <- length(locations)
# using base R, use the function expand.grid instead of crossing
location_combinations <- crossing(Origin = locations, Dest = locations)
dist_matrix <- matrix(0,nrow = n_locations, ncol = n_locations)
dist_matrix[lower.tri(dist_matrix)] <- c(8, 11, 6)
dist_matrix <- dist_matrix + t(dist_matrix)
transitions <- data_frame(
Origin = locations,
Dest = locations[c(2,3,2)],
Time = c("Mon", "Wed", "Fri")
)
# Make "Dest" a vector instead of the rownames to work with it a little more easily.
popularity <- data_frame(
Dest = locations,
Popularity = as.integer(c(25, 47, 32))
)
# left_join can be replaced with "merge" using base R.
# mutate can be replaced by defining/redefining each variable separately, or using the "within" command.
tmp <- location_combinations %>%
left_join(transitions, by = c("Origin", "Dest")) %>%
left_join(popularity, by = "Dest") %>%
mutate(
Origin = as_factor(Origin),
Dest = as_factor(Dest),
`Went?` = !is.na(Time),
Time_Dest = paste(Time, Dest, sep = "_"),
index = (as.numeric(Origin)-1) * n_locations + as.numeric(Dest),
Dist = dist_matrix[(as.numeric(Origin)-1) * length(locations) + as.numeric(Dest)]
) %>%
select(-Time)
tmp
This gives you almost what you want. Two differences - first, I left Went?
as a logical vector instead of 1/0. Multiply by 1 to fix this if needed for logistic regression. The other difference is the "Time_Dest" column, which doesn't have a date for an event that didn't happen. In other words, "instead of "Mon_A" for A to A, it sees "NA_A". If this is a big problem, I can almost certainly address this with another merge/join, so let me know if you need it and can't figure it out. (Hint - do a 2nd merge with Transitions data frame, but with by = origin
).
To see partial work (and better understand pipes, you can run pieces of this code. For example, try
location_combinations %>%
left_join(transitions, by = c("Origin", "Dest"))
Alright, so now you (more or less) have the entire data set in one spot. To split it, there are several options.
You can use split
to split it up by Origin. The code looks like
list_of_dfs <- split(tmp, tmp$Origin)
This produces exactly what you asked for, a list of data frames which can be analyzed separately.
- You can use
group_by
function in the dplyr
package (part of tidyverse
.) An example using this approach is at Linear Regression and group by in R. The caveat here is that the do
function is/will be depreciated, so this isn't a solution that will work forever. I haven't needed it recently, so I'm not sure what the "new" solution is, but this, in combination with the broom
package can almost certainly help you to organize your results. (See https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html).
Update to include all possible destinations
location_combinations %>%
left_join(transitions, by = c("Origin", "Dest")) %>%
left_join(transitions %>% select(Origin, Time), by = "Origin") %>%
left_join(popularity, by = "Dest") %>%
mutate(
Origin = as_factor(Origin),
Dest = as_factor(Dest),
`Went?` = !is.na(Time.x),
Time_Dest = paste(Time.y, Dest, sep = "_"),
index = (as.numeric(Origin)-1) * n_locations + as.numeric(Dest),
Dist = dist_matrix[(as.numeric(Origin)-1) * length(locations) + as.numeric(Dest)]
) %>%
select(-Time.x, -Time.y, -index)