I want to subset five data frames based on a series of 31 variables. The data frames are stored in a list:
long_data_sets <- list(scale_g1, scale_g2, scale_g3, scale_g4, scale_g5)
Each of the five data frames includes the identical set of columns, among others, 31 factors called "speeder_225" through "speeder_375":
> str(scale_g1[53:83])
'data.frame': 5522 obs. of 31 variables:
$ speeder_225: Factor w/ 2 levels "Non-Speeder",..: 1 1 1 1 1 1 1 1 1 1 ...
$ speeder_230: Factor w/ 2 levels "Non-Speeder",..: 1 1 1 1 1 1 1 1 1 1 ...
$ speeder_235: Factor w/ 2 levels "Non-Speeder",..: 1 1 1 1 1 1 1 1 1 1 ...
$ speeder_240: Factor w/ 2 levels "Non-Speeder",..: 1 1 1 1 1 1 1 1 1 1 ...
$ speeder_245: Factor w/ 2 levels "Non-Speeder",..: 1 1 1 1 1 1 1 1 1 1 ...
...
I want to subset the data frames based on one of the 31 factor variables at a time, so that I end up with 5*31 new data frames.
I created the function for subsetting that retains only two columns that I need going forward ("direction" and "response"):
create_speeder_data <- function(x, y){
df <- subset(x, x[,y] == "Speeder",
select = c("direction", "response"))
}
This allows me to create one new data frame at a time:
create_speeder_data(scale_g1, "speeder_225")
I tried to apply the function using map2() and the list of 5 data frames and a list of the 31 factor names, but this obviously did not work.
> speeder_var <- names(scale_g1[53:83])
> map2(long_data_sets, speeder_var, create_speeder_data)
Error: `.x` (5) and `.y` (31) are different lengths
The closest I could get was to take out the y argument from my function and apply the function to the list of five data frames for one of the 31 factors.
#Create subsetting function for "speeder_225"
create_speeder_225_data <- function(x){
df <- subset(x, x$speeder_225 == "Speeder",
select = c("direction", "response"))
}
#Map function to list of data frames
z_speeder_225 <- map(long_data_sets, create_speeder_225_data)
#Change names of new data frames in list
names(long_data_sets) <- c("g1", "g2", "g3", "g4", "g5")
names(z_speeder_225) <- paste0(names_long_data_sets, "speeder_225")
#Get data frames from list
list2env(z_speeder_225, envir=.GlobalEnv)
I would need to repeat this 30 more times to get to my 5*31 data frames. There must be an easier way to do that.
Any help is much appreciated!