0

I have 8 datasets (for 8 different years) with data on different countries. I want to extract the data for a given country in all the years. The proposed function filters for that country in every database, and then concatenates. (The reason I don't create a metadabase with every country, is that the databases are huge, and it's very expensive computationally). Is there a simpler way of doing this function? Also, I wanted to add a variable number of countries to filter. Is there a way of allowing the user of the function to filter for 2, 3 or 4 different countries?

Extract_Data <- function(data2014, data2015, data2016, data2017, data2018, data2019, data2020, data2021, var1) {
  data14 <- data2014 %>% dplyr::filter(Country == var1)
  data2015 <- data2015 %>% dplyr::filter(Country == var1)
  data2016 <- data2016 %>% dplyr::filter(Country == var1)
  data2017 <- data2017 %>% dplyr::filter(Country == var1)
  data2018 <- data2018 %>% dplyr::filter(Country == var1)
  data2019 <- data2019 %>% dplyr::filter(Country == var1)
  data2020 <- data2020 %>% dplyr::filter(Country == var1)
  data2021 <- data2021 %>% dplyr::filter(Country == var1)
  data <- bind_rows(data2014,
                    data2015,
                    data2016,
                    data2017,
                    data2018,
                    data2019,
                    data2020,
                    data2021)
  return(data)
}

iturrizaga
  • 15
  • 3
  • 1
    I'd suggest reading my answer at [How to make a list of data frames?](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames). If you have enough memory to load all the data at once (which you've apparently done), it will likely be faster to combine it all into one database and work with that rather than keep them separate and working with each part. – Gregor Thomas Jun 20 '23 at 15:32

1 Answers1

0

It will be easier with some sample data, but here is one way using lapply. You should first create a custom function, then use lapply to map the function to your data frames.

library(dplyr)

# example data
data2014 <- data.frame(year = rep(2014, 6), Country = rep(c("US", "Canada", "Mexico"), 2))
data2015 <- data.frame(year = rep(2015, 8), Country = rep(c("US", "Canada", "Mexico", "Brazil"), 2))
data2016 <- data.frame(year = rep(2016, 10), Country = rep(c("US", "Canada", "Mexico", "Brazil", "Columbia"), 2))

# function
filter_by_country <- function(df, ...){
  df %>% filter(Country %in% c(...))
}

# combine df's in a list
df_list <- list(data2014, data2015, data2016)

# filter by Countries "US" and "Canada"
result <- df_list %>% lapply(filter_by_country, "US", "Canada") %>% do.call(rbind, .)

output

> result
   year Country
1  2014      US
2  2014  Canada
3  2014      US
4  2014  Canada
5  2015      US
6  2015  Canada
7  2015      US
8  2015  Canada
9  2016      US
10 2016  Canada
11 2016      US
12 2016  Canada
nightstand
  • 329
  • 2
  • 11