3

I'm attempting to import the most recent .csv from my working directory into R. Adamant this method was working previously but appears to no longer be.

Each day a .csv file is outputted to my designated folder, from where I import it into RStudio for manipulation. There are 2 files in this folder currently.

Please see code and description as follows:

1) Following code retrieves names of all csv files in directory.

# find filenames of all .csvs in directory 
filenames <- Sys.glob("*.csv")

> filenames
[1] "February 26, 2018 at 03:59PM myfile.csv" "February 26, 2018 at 04:00PM myfile.csv"

2) Next step is to remove redundant info from filename string and just keep date info:

# remove redundant file info  
newdates <- sub("at.*", "", filenames)

> newdates
[1] "February 26, 2018 " "February 27, 2018 "

3) Then I Remove comma from date

# remove comma from date string 
newdates <- gsub('\\$|,', '', newdates)

> newdates
[1] "February 26 2018 " "February 27 2018 "

4) In this step I change the date format

# change to short date format
betterdate <- as.Date(newdates,format = "%B %d %Y")

> betterdate 
[1] "2018-02-26" "2018-02-27"

5) Then I set max(betterdate) as the latest file

# takes latest file name as most recent file 
latestfile <- max(betterDates)

> latestfile 
[1] "2018-02-27"

6) And finally I import this file

# import file with latest date 
 rawfile <- read.csv(file=latestfile, header=TRUE, sep=",")

As I say, previously this inelegant solution was working as designed, however after some weeks I now receive this error message.

Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection

Is it possible to explain what the issue is and how I might go about this whole endeavour in a better way?

jimiclapton
  • 775
  • 3
  • 14
  • 42
  • 2
    You are passing a Date to `read.table` (check `class(latestfile)`). You need to pass in the file name as a character value that has corresponds to that date. I can't see how this ever would have worked. When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Give a `dput()` of `filenames` and put the code together so we can easily copy/paste into R to test it. – MrFlick Feb 27 '18 at 19:02
  • 1
    Two things: 1) Somehow your two files, which started with different dates ended with the same date. 2) `lastestfile` is not a file name but the max date, you need to either reconstruct the file name based on the date or do something along the lines of `filenames[which.max(betterdates)` to get the filename – emilliman5 Feb 27 '18 at 19:04
  • @emilliman5 apologies, file dates corrected. Your solution appears to work if I do the following: `x <- read.csv(file=filenames[which.max(betterdates)], header=TRUE, sep=",")` – jimiclapton Feb 27 '18 at 19:23
  • @emilliman5 Completely understand the issue. I would happily accept this as the answer as it is precisely what I needed. Thank you kindly – jimiclapton Feb 27 '18 at 19:29
  • @MrFlick I hear you. Upon discovering the error when running the code this afternoon I was also baffled as to how it ever would've worked, but it did. Perhaps I'd since inadvertently mod'd the code and omitted a step. For now I'll incorporate `filenames[which.max(betterdates)` as suggested by @emilliman5 Thanks both for your time. Much appreciated. – jimiclapton Feb 27 '18 at 19:33
  • The is no need to slice and dice the filename. This statement works for betterdate: `as.Date(newdates, "%B %d, %Y")`. All of the text starting with at is ignored. – Dave2e Feb 27 '18 at 20:59

2 Answers2

4

If you can trust the creation time tracked by the operating system:

data_files <- file.info(Sys.glob("*.csv"))
row.names(data_files)[which.max(data_files[["ctime"]])]
Nathan Werth
  • 5,093
  • 18
  • 25
2

You can use which.max to get the index of the most current date and use that to retrieve the filename from the filenames vector

rawfile <- read.csv(file=filenames[which.max(betterdates), header=TRUE, sep=",")
emilliman5
  • 5,816
  • 3
  • 27
  • 37