How to import most recent csv file into RStudio

Question

I'm attempting to import the most recent .csv from my working directory into R. Adamant this method was working previously but appears to no longer be.

Each day a .csv file is outputted to my designated folder, from where I import it into RStudio for manipulation. There are 2 files in this folder currently.

Please see code and description as follows:

1) Following code retrieves names of all csv files in directory.

# find filenames of all .csvs in directory 
filenames <- Sys.glob("*.csv")

> filenames
[1] "February 26, 2018 at 03:59PM myfile.csv" "February 26, 2018 at 04:00PM myfile.csv"

2) Next step is to remove redundant info from filename string and just keep date info:

# remove redundant file info  
newdates <- sub("at.*", "", filenames)

> newdates
[1] "February 26, 2018 " "February 27, 2018 "

3) Then I Remove comma from date

# remove comma from date string 
newdates <- gsub('\\$|,', '', newdates)

> newdates
[1] "February 26 2018 " "February 27 2018 "

4) In this step I change the date format

# change to short date format
betterdate <- as.Date(newdates,format = "%B %d %Y")

> betterdate 
[1] "2018-02-26" "2018-02-27"

5) Then I set max(betterdate) as the latest file

# takes latest file name as most recent file 
latestfile <- max(betterDates)

> latestfile 
[1] "2018-02-27"

6) And finally I import this file

# import file with latest date 
 rawfile <- read.csv(file=latestfile, header=TRUE, sep=",")

As I say, previously this inelegant solution was working as designed, however after some weeks I now receive this error message.

Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection

Is it possible to explain what the issue is and how I might go about this whole endeavour in a better way?

You are passing a Date to `read.table` (check `class(latestfile)`). You need to pass in the file name as a character value that has corresponds to that date. I can't see how this ever would have worked. When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Give a `dput()` of `filenames` and put the code together so we can easily copy/paste into R to test it. — MrFlick, Feb 27 '18 at 19:02
Two things: 1) Somehow your two files, which started with different dates ended with the same date. 2) `lastestfile` is not a file name but the max date, you need to either reconstruct the file name based on the date or do something along the lines of `filenames[which.max(betterdates)` to get the filename — emilliman5, Feb 27 '18 at 19:04
@emilliman5 apologies, file dates corrected. Your solution appears to work if I do the following: `x <- read.csv(file=filenames[which.max(betterdates)], header=TRUE, sep=",")` — jimiclapton, Feb 27 '18 at 19:23
@emilliman5 Completely understand the issue. I would happily accept this as the answer as it is precisely what I needed. Thank you kindly — jimiclapton, Feb 27 '18 at 19:29
@MrFlick I hear you. Upon discovering the error when running the code this afternoon I was also baffled as to how it ever would've worked, but it did. Perhaps I'd since inadvertently mod'd the code and omitted a step. For now I'll incorporate `filenames[which.max(betterdates)` as suggested by @emilliman5 Thanks both for your time. Much appreciated. — jimiclapton, Feb 27 '18 at 19:33
The is no need to slice and dice the filename. This statement works for betterdate: `as.Date(newdates, "%B %d, %Y")`. All of the text starting with at is ignored. — Dave2e, Feb 27 '18 at 20:59

score 4 · Answer 1 · answered Feb 27 '18 at 21:09

4

If you can trust the creation time tracked by the operating system:

data_files <- file.info(Sys.glob("*.csv"))
row.names(data_files)[which.max(data_files[["ctime"]])]

answered Feb 27 '18 at 21:09

Nathan Werth

5,093
18
25

score 2 · Accepted Answer · answered Feb 27 '18 at 20:47

2

You can use which.max to get the index of the most current date and use that to retrieve the filename from the filenames vector

rawfile <- read.csv(file=filenames[which.max(betterdates), header=TRUE, sep=",")

answered Feb 27 '18 at 20:47

emilliman5

5,816
3
27
37

How to import most recent csv file into RStudio

2 Answers2