Trying to use read_sas
to import some SAS files in one folder into R, with the file name meeting the pattern "medpar20XX", (XX = 00,01,....,16).
Also, a specific list of columns is needed. I used a for loop but only the last SAS data file (according to the loop increment indicator) was imported successfully, and all the resulted R data frames are this last data file.
Below is code to get the list of files in the folder that meet the file name pattern
patt = "medpar[0-9]{4}[[:punct:]]sas7bdat"
file_list <- list.files(path="E:/Data/Bell_Disasters",pattern = patt)
The code to read a single file with path and file name spelled out:
medpar2000 <- read_sas("E:/Data/Bell_Disasters/medpar2000.sas7bdat", cols_only = c("HIC","PRVNUMGRP","SSLSSNF","sadmsndt","sdschrgdt"))
The SAS file was imported successfully.
Below is the for loop to read in the SAS data files. For year between 2000 -2002 the columns needed are the same (specified in cols_only=c("HIC","PRVNUMGRP","SSLSSNF","sadmsndt","sdschrgdt")
) , but for other years, the columns are different. When year is between 2003 and 2006, cols_only = c('BENE_ID','PRVSTATE','PRVNUM3','PRVDRSRL','SSLSSNF','ADMSNDT','DSCHRGDT')
.
For year between 2007 and 2012, cols_only = c('bene_id', 'MEDPAR_ID', 'PRVDR_NUM', 'SS_LS_SNF_IND_CD', 'ADMSN_DT', 'DSCHRG_DT')
.
Below is the code I tested on year between 2000 and 2002 only, and I update the logic on year to if', instead of
for` loop:
for (i in 1:length(file_list))
{
# retrieve the year number in the sas file name
year <-regmatches(file_list[i],regexpr('[0-9]{4}',file_list[i]))
if (year %in% c('2000','2001','2002')) {
# read in SAS data set
temp_data <- read_sas(file.path('E:/Data/Bell_Disasters',file_list[i]), cols_only = c("HIC","PRVNUMGRP","SSLSSNF","sadmsndt","sdschrgdt"))
# rename data set
assign(paste('medpar',year,sep =''),temp_data)
}
else if (year %in% c('2003','2004','2005','2006') {
# read in SAS data set
temp_data <- read_sas(file.path('E:/Data/Bell_Disasters',file_list[i]), cols_only=c('BENE_ID','PRVSTATE','PRVNUM3','PRVDRSRL','SSLSSNF','ADMSNDT','DSCHRGDT')
# rename data set
assign(paste('medpar',year,sep =''),temp_data)
}
}
The process is extremely slow. When I force stop, I see some files are indeed imported successfully into R. Is there a way to make this process more efficient?