0

I know there are similar questions to this but I haven't came across ones using the map function from the purrr package. I am having a difficult time trying to read in some excel files(.xlsx) using purrr::map(). I would like each one to be it's own data frame. I tried the approach in this similar question: How can I reading multiple (excel) files into R?.

However, I keep getting this error:

Error: path does not exist: "tab3_DOfinal_HUClevel_assessment.xlsx"

I know for sure I have the right path. Not sure why I am getting this error. I have about 9 excel spreadsheets that I want to read in.

Code I tried:

# load necessary package
library(purrr)

file.list <- list.files(path="2016_Data_Tables",pattern='*.xlsx')
file.list <- setNames(file.list, file.list)

# store all .xlsx files as individual data frames inside of one list
df <- map(file.list, read_xlsx)

The file name pattern goes as follows:

tab3_DOfinal_HUClevel_assessment.xlsx

The only thing that changes is the DOfinal part.

Some sample data:

structure(list(ID = 1, WMA = 15, Number = "02040302020030-01", 
    HUC14 = "HUC02040302020030", Name = "Absecon Creek (AC Reserviors) (gage to SB)", 
    Region = "Atlantic Coast", NumofStations = "2", ListofStations = "01410455, R32", 
    ListofAssessment = "2, 2", HUCTier = "2", swqs = "PL, SE1", 
    TotalNumSamples5yrs = "NA", flgusgsprelim = "NA, 0", auassess = 2, 
    auassesstrout = -999, finalauassess = 2, finalauassesstrout = -999, 
    Changefrom2014 = "No Change-2", Changetroutfrom2014 = "No Change", 
    listHUC14assess5 = "NA", listHUC14assess3 = "NA", listHUC14assess2 = "01410455, R32", 
    His2014 = "Attaining", His2014trout = "-999", Notes = NA_character_, 
    OldStations2014 = "01410455", OldStationsAssess2014 = "2", 
    Error = NA_character_), .Names = c("ID", "WMA", "Number", 
"HUC14", "Name", "Region", "NumofStations", "ListofStations", 
"ListofAssessment", "HUCTier", "swqs", "TotalNumSamples5yrs", 
"flgusgsprelim", "auassess", "auassesstrout", "finalauassess", 
"finalauassesstrout", "Changefrom2014", "Changetroutfrom2014", 
"listHUC14assess5", "listHUC14assess3", "listHUC14assess2", "His2014", 
"His2014trout", "Notes", "OldStations2014", "OldStationsAssess2014", 
"Error"), row.names = c(NA, -1L), class = c("tbl_df", "tbl", 
"data.frame"))


structure(list(WMA = 15, Number = "02040302020030-01", HUC14 = "HUC02040302020030", 
    Name = "Absecon Creek (AC Reserviors) (gage to SB)", Region = "Atlantic Coast", 
    NumofStations = "1", ListofStations = "01410455", ListofAssessment = "2", 
    MaxStaAssessment = "2", MinStaAssessment = "2", TotalNumSamples5yrs = "NA", 
    auassess = "2", ChangeFrom2014 = "No Change-2", liststaassess2 = "01410455", 
    liststaassess3 = "NA", liststaassess5 = "NA", Assessment2014 = "Attaining", 
    Comments = NA_character_), .Names = c("WMA", "Number", "HUC14", 
"Name", "Region", "NumofStations", "ListofStations", "ListofAssessment", 
"MaxStaAssessment", "MinStaAssessment", "TotalNumSamples5yrs", 
"auassess", "ChangeFrom2014", "liststaassess2", "liststaassess3", 
"liststaassess5", "Assessment2014", "Comments"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))

structure(list(WMA = 15, Number = "02040302020030-01", HUC14 = "HUC02040302020030", 
    Name = "Absecon Creek (AC Reserviors) (gage to SB)", Region = "Atlantic Coast", 
    NumofStations = "1", ListofStations = "R32", ListofAssessment = "3", 
    MaxStaAssessment = "3", MinStaAssessment = "3", TotalNumSamples5yrs = "9", 
    auassess = "3", ChangeFrom2014 = "No Change-3", liststaassess2 = "NA", 
    liststaassess3 = "R32", liststaassess5 = "NA", Assessment2014 = "N/A", 
    Comments = NA_character_), .Names = c("WMA", "Number", "HUC14", 
"Name", "Region", "NumofStations", "ListofStations", "ListofAssessment", 
"MaxStaAssessment", "MinStaAssessment", "TotalNumSamples5yrs", 
"auassess", "ChangeFrom2014", "liststaassess2", "liststaassess3", 
"liststaassess5", "Assessment2014", "Comments"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))
Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33
NBE
  • 641
  • 2
  • 11
  • 33
  • 1
    Here's one strategy: Try creating a dataframe with a column that contains a path to each of your files. Then use `mutate(newcol = map_df(filename, read_xls)` and `unnest` – Nettle Sep 16 '18 at 01:53
  • You need either `list.files(path="2016_Data_Tables", pattern='*.xlsx', full.names = TRUE)` (Otherwise the file names are relative to the `path` argument, not to the current folder). Or `sapply("2016_Data_Tables", file.path, list.files(path="2016_Data_Tables", pattern='*.xlsx'))` – Aurèle Sep 18 '18 at 16:36

1 Answers1

0

Aurèle makes a really good point regarding your file paths.

I would like each one to be it's own data frame

If this is the goal, then a combination of purrr::iwalk and assign could easily get you there. The process goes as follows:

  1. Get a list of all of the .xlsx files located in 2016_Data_Tables/.
  2. Then use purrr::set_names to name each element in this list with its filename sans the .xlsx extension.
  3. Then use purrr::iwalk to apply the assign function to each element in the list. Specifically, use read_xlsx to read each .xlsx file from disk into a data frame and then assign that data frame as a named object to R's global environment
list.files('data/mpg', pattern = '.xlsx', full.names = T) %>% 
  purrr::set_names(stringr::str_remove(basename(.), '.xlsx$')) %>% 
  purrr::iwalk(function(x, i) assign(i, readxl::read_xlsx(x), .GlobalEnv))
hendrikvanb
  • 459
  • 3
  • 5