0

I have 15 CSVs on google drive, and I want to download them and left join them at the same time. Thirteen have the same ID variable (inegi), but two have different ID names (CVEGEO and id_mun). How can I join the other two with different ID names in the columns? Here is the code I currently have:

# Paquetes  ---------------------------------------------------------------
if(!require(pacman)) install.packages("pacman")
pacman::p_load(tidyverse, janitor, googledrive, 
               stringr, purrr)


# Download municipal Files -------------------------------------------------------------------
files_municipal <- drive_ls(as_id("IDOFGOOGLE")) %>% 
  filter(!str_detect(name, "cumulative-deaths|excess")) %>%
  arrange(name)

tmp <- paste(tempdir(), files_municipal$name, sep = "/")

walk2(files_municipal$id, tmp, ~ drive_download(as_id(.x), path = .y, overwrite = TRUE)) 

# Join --------------------------------------------------------------------
tempo <- map(tmp, read_csv) %>% 
  reduce(left_join, by = "inegi")  
JorRu
  • 3
  • 1
  • Rename ID columns for consistency while saving DFs to list. See [Rename only if field exists, otherwise ignore](https://stackoverflow.com/q/68965823/1422451). – Parfait Jul 04 '22 at 20:52
  • Thanks! It worked with rename_with – JorRu Jul 04 '22 at 23:46

1 Answers1

0

I have the solution. I created a function at the beginning:

# Paquetes  ---------------------------------------------------------------
if(!require(pacman)) install.packages("pacman")
pacman::p_load(tidyverse, janitor, googledrive, 
              stringr, purrr)

# Functions  ---------------------------------------------------------------
read_vars_mun <- function(df) {
 
 df <- read_csv(df) %>% 
   rename_with(
     ~ case_when(
       . == "CVEGEO" ~ "inegi",
       . == "id_mun" ~ "inegi",
       TRUE ~ .)) %>% 
   mutate(inegi = as.character(inegi)) 
}

# Download municipal Files -------------------------------------------------------------------
files_municipal <- drive_ls(as_id("IDOFGOOGLE")) %>% 
 filter(!str_detect(name, "cumulative-deaths|excess")) %>%
 arrange(name)

tmp <- paste(tempdir(), files_municipal$name, sep = "/")

walk2(files_municipal$id, tmp, ~ drive_download(as_id(.x), path = .y, overwrite = TRUE)) 

# Join --------------------------------------------------------------------
df <- map(tmp, read_vars_mun) %>% 
 reduce(left_join, by = "inegi")  

# Done. 
JorRu
  • 3
  • 1