0

I have 50 Stata-formatted data sets that I would like to read into R and save as RData sets. Currently, my code looks like this:

# Package to read Stata data sets into R
library(haven)

# Data set 1: Read Stata data into R
dataset1 <- read_dta("C:/dataset1.dta")

# Save as RData
save(dataset1, file = "C:/RData/dataset1.Rdata")

# Data set 2: Read Stata data into R
dataset2 <- read_dta("C:/dataset2.dta")

# Save as RData
save(dataset2, file = "C:/RData/dataset2.Rdata")

This is clunky and takes up many lines of code. I would like to create a function or a loop that will go thorough this efficiently and is easier to understand and debug.

This code gets me almost there (thanks @canyon), except that when I load the data files, they all have the name "import_data" name. The files themselves are named correctly (i.e., dataset1.Rdata, dataset2.Rdata), but when loaded into R, the environment name is "import_data". This is problematic as I can't have more than 1 of the files open in the same environment as it will override the existing one (e.g., dataset2.Rdata will override dataset1.Rdata). Is there a way to save the files with a name that matches the file = option in save?

library(haven)
library(stringr)

your_function <- function(x) {
import_path <- str_c("C:/dataset", (x), ".dta")
import_data <- read_dta(import_path)
save_path <- str_c("C:/RData/dataset", (x), ".Rdata")
save(import_data, file = save_path)
}

lapply(1:50, your_function)

I looked at linked posts that seemingly address this issue, but none of them solve this specific issue.

scottsmith
  • 371
  • 2
  • 11
  • 2
    Try looking [here](https://stackoverflow.com/questions/14958516/looping-through-all-files-in-directory-in-r-applying-multiple-commands) or [here](https://www.r-bloggers.com/looping-through-files/) or [here](https://stackoverflow.com/questions/44842367/for-loop-with-file-names-in-r) – MrFlick Dec 01 '17 at 22:02
  • These posts get me almost there, but don't solve how to deal with the issue of naming the data object the same as the file name. Same with the "Looping through all files in directory in R, ..." suggested answer. It seems there are many posts on how to name the files dynamically, but not on how to dynamically name the data objects. – scottsmith Dec 05 '17 at 17:17

1 Answers1

0

Try this example (edited):

read_function <- function(x) {
  import_path <- str_c("dataset", (x), ".rds")
  readRDS(file = import_path)

}

df_list <- lapply(1:2, read_function)

assign_function <- function(x) {
  dataset_name <- str_c("dataset", (x))
  assign(dataset_name, df_list[[(x)]], inherits = TRUE)
}

lapply(1:2, assign_function)

The idea is to read in the datasets, store these in a list, and then assign names to each element of the list and return to environment.

canyon
  • 1
  • 2
  • One follow-up question @canyon. This almost works perfectly. One issue is that all the data sets are labeled as "import_data". This is because `save(import_data, file = save_path)` is telling the function to name them all the same. The file names are correct, they just all have the same data set labesl. Is there a way to save the files so the data set labels are the same as the file names? Thanks! – scottsmith Dec 04 '17 at 16:37
  • The issue is that since they are all labeled the same, they can't all be open in the same environment. If I open "dataset1" then "dataset2", "dataset2" will override "dataset1", even though each data set is correct when opened alone. – scottsmith Dec 04 '17 at 16:50
  • Sorry I think you'll need to wait for someone else to answer this (for various reasons this specific issue doesn't normally come up for me so don't know how to answer). – canyon Dec 04 '17 at 20:26