0

I am trying to load in multiple files into an R environment, I have tried something like the following;

files <- list.files(pattern = ".Rda", recursive = TRUE)

lapply(files,load,.GlobalEnv)

Which only loads in one data file (incorrectly). The problem I am finding is that all the files have the same names across each years. For example "Year1/beer/beer.Rda" has also "Year2/beer/beer.Rda".

I am trying to rename the data files upon import so beer1 and beer2 will correspond to beer year 1 and beer year 2 etc.

Anybody have a better method of loading in the data? I have more than 2 years worth of data.

File names:

 [1] "Year1/beer/beer.Rda"         "Year1/blades/blades.Rda"     "Year1/carbbev/carbbev.Rda"  
 [4] "Year1/cigets/cigets.Rda"     "Year1/coffee/coffee.Rda"     "Year1/coldcer/coldcer.Rda"  
 [7] "Year1/deod/deod.Rda"         "Year1/diapers/diapers.Rda"   "Year1/factiss/factiss.Rda"  
[10] "Year1/fzdinent/fzdinent.Rda" "Year1/fzpizza/fzpizza.Rda"   "Year1/hhclean/hhclean.Rda"  
[13] "Year1/hotdog/hotdog.Rda"     "Year1/laundet/laundet.Rda"   "Year1/margbutr/margbutr.Rda"
[16] "Year1/mayo/mayo.Rda"         "Year1/milk/milk.Rda"         "Year1/mustketc/mustketc.Rda"
[19] "Year1/paptowl/paptowl.Rda"   "Year1/peanbutr/peanbutr.Rda" "Year1/photo/photo.Rda"      
[22] "Year1/razors/razors.Rda"     "Year1/saltsnck/saltsnck.Rda" "Year1/shamp/shamp.Rda"      
[25] "Year1/soup/soup.Rda"         "Year1/spagsauc/spagsauc.Rda" "Year1/sugarsub/sugarsub.Rda"
[28] "Year1/toitisu/toitisu.Rda"   "Year1/toothbr/toothbr.Rda"   "Year1/toothpa/toothpa.Rda"  
[31] "Year1/yogurt/yogurt.Rda"     "Year2/beer/beer.Rda"         "Year2/blades/blades.Rda"    
[34] "Year2/carbbev/carbbev.Rda"   "Year2/cigets/cigets.Rda"     "Year2/coffee/coffee.Rda"    
[37] "Year2/coldcer/coldcer.Rda"   "Year2/deod/deod.Rda"         "Year2/diapers/diapers.Rda"  
[40] "Year2/factiss/factiss.Rda"   "Year2/fzdinent/fzdinent.Rda" "Year2/fzpizza/fzpizza.Rda"  
[43] "Year2/hhclean/hhclean.Rda"   "Year2/hotdog/hotdog.Rda"     "Year2/laundet/laundet.Rda"  
[46] "Year2/margbutr/margbutr.Rda" "Year2/mayo/mayo.Rda"         "Year2/milk/milk.Rda"        
[49] "Year2/mustketc/mustketc.Rda" "Year2/paptowl/paptowl.Rda"   "Year2/peanbutr/peanbutr.Rda"
[52] "Year2/photo/photo.Rda"       "Year2/razors/razors.Rda"     "Year2/saltsnck/saltsnck.Rda"
[55] "Year2/shamp/shamp.Rda"       "Year2/soup/soup.Rda"         "Year2/spagsauc/spagsauc.Rda"
[58] "Year2/sugarsub/sugarsub.Rda" "Year2/toitisu/toitisu.Rda"   "Year2/toothbr/toothbr.Rda"  
[61] "Year2/toothpa/toothpa.Rda"   "Year2/yogurt/yogurt.Rda"
user113156
  • 6,761
  • 5
  • 35
  • 81
  • Another point I am running into is when I try the following; `load("Year1/saltsnck/saltsnck.Rda") load("Year2/saltsnck/saltsnck.Rda")` the files load in as `data.Rda` so year2 is overwriting year1 data. – user113156 Sep 30 '18 at 18:43

2 Answers2

2

One option might be to load the files in a new environment and then assign them to a custom named object in the parent environment.

This is modified from https://stackoverflow.com/a/5577647/6561924

# first create custom names for objects (e.g. add folder names)
file_names <- gsub("/", "_", files)
file_names <- gsub("\\.Rda", "", file_names)

# function to load objects in new environ
load_obj <- function(f, f_name) {
  env <- new.env()
  nm <- load(f, env)[1]  # load into new environ and capture name
  assign(f_name, env[[nm]], pos = 1) # pos 1 is parent env
}

# load all
mapply(load_obj, files, file_names)
pbee
  • 183
  • 6
  • That worked amazing! All files are loaded into the global environment. The data saved as `Year1_beer_beer` for example but `gsub()` will help clear this up to something like `Year1_beer`. – user113156 Sep 30 '18 at 20:30
2

One solution is to parse the file names and assign them as names to elements in a list of data frames. We'll use some sample data that has monthly sales for beer brands across two years that were saved as CSV files into two subdirectories, year1 and year2.

We will use lapply() to read the files into a list of data frames, and then use the names() function to name each element by appending year<x>. to the file name (excluding .csv).

fileList <- c("year1/beer.csv","year2/beer.csv")

data <- lapply(fileList,function(x){
     read.csv(x)
})
# generate data set names to be assigned to elements in the list
fileNameTokens <- strsplit(fileList,"/|[.]")

theNames <- unlist(lapply(fileNameTokens,function(x){
     paste0(x[1],".",x[2])
}))
names(data) <- theNames
# print first six rows of file 1 based on named extract
data[["year1.beer"]][1:6,]

...and the output.

> data[["year1.beer"]][1:6,]
  Month      Item Sales
1     1 Budweiser 83047
2     2 Budweiser 38374
3     3 Budweiser 47287
4     4 Budweiser 18417
5     5 Budweiser 23981
6     6 Budweiser 55471
> 

Next, we'll print the first few rows of the second file.

> # print first six rows of file 1 based on named extract
> data[["year2.beer"]][1:6,]
  Month      Item Sales
1     1 Budweiser 23847
2     2 Budweiser 33847
3     3 Budweiser 44400
4     4 Budweiser 35333
5     5 Budweiser 18710
6     6 Budweiser 63108
> 

If one needs to access the files directly without relying on the list() names, they can be assigned to the parent environment within the lapply() function via the assign() function, as noted in the other answer.

# alternate form, assigning directly to parent environment

data <- lapply(fileList,function(x){
     # x is the filename, parse into strings to generate data set name
     fileNameTokens <- unlist(strsplit(x,"/|[.]"))
     assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), read.csv(x),pos=1)
})
head(year1.beer)

...and the output.

> head(year1.beer)
  Month      Item Sales
1     1 Budweiser 83047
2     2 Budweiser 38374
3     3 Budweiser 47287
4     4 Budweiser 18417
5     5 Budweiser 23981
6     6 Budweiser 55471
> 

The technique also works with RDS files as follows.

data <- lapply(fileList,function(x){
     # x is the filename, parse into strings to generate data set name
     fileNameTokens <- unlist(strsplit(x,"/|[.]"))
     assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), readRDS(x),pos=1)
})
head(year1.beer)

...and the output.

> head(year1.beer)
  Month      Item Sales
1     1 Budweiser 83047
2     2 Budweiser 38374
3     3 Budweiser 47287
4     4 Budweiser 18417
5     5 Budweiser 23981
6     6 Budweiser 55471
>
Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • This looks like it would work great for `.csv` files. I have tried loading in the data using this method and it imports 62 elements of a list, however when I run `data[["year1.beer"]][1:6,] ` or `data$year1.beer` R crashes. I tried modifying the code a little and changing `read.csv(x) to `load(x)` but this did not work. – user113156 Sep 30 '18 at 20:37
  • I am in the process of saving all the files as `.csv` so I am going to save this code and use it for the `.csv` imports. – user113156 Sep 30 '18 at 20:38
  • 1
    @user113156 - I used csv files because I didn't bother to save them as `.rda`. I'll generate the `.rda` files and confirm the results. – Len Greski Sep 30 '18 at 20:45
  • @user113156 - I was able to make the process work with `saveRDS()` and 'readRDS()`, as noted above. – Len Greski Sep 30 '18 at 21:07
  • Thanks I will take a look at it! – user113156 Sep 30 '18 at 21:28
  • @user113156 - Do all the data files have the same columns? If so, another approach would be to add the directory and product category information as columns in each data frame, which would allow you to combine all of the data into a single data frame. Reply if you'd like me to post another answer that takes this approach. – Len Greski Oct 01 '18 at 10:12