1

Building off of this question (Retrieve modified DateTime of a file from an FTP Server), it's clear how to get the date modified value. However, the full date is not returned even though it's visible from the FTP site.

This shows how to get the date modified values for files at ftp://ftp.FreeBSD.org/pub/FreeBSD/

library(curl)
library(stringr)

con <- curl("ftp://ftp.FreeBSD.org/pub/FreeBSD/")
dat <- readLines(con)
close(con)

no_dirs <- grep("^d", dat, value=TRUE, invert=TRUE)
date_and_name <- sub("^[[:alnum:][:punct:][:blank:]]{43}", "", no_dirs)
dates <- sub('\\s[[:alpha:][:punct:][:alpha:]]+$', '', date_and_name)
dates
## [1]  "May 07  2015" "Apr 22 15:15" "Apr 22 10:00"

Some dates are in month/day/year format, others are in month/day/hour/minute format.

Looking at the FTP site, all dates in month/day/year hour/minutes/seconds format.

enter image description here

I assume it's got something to do with Unix format standards (explained in FTP details command doesn't seem to return the year the file was modified, is there a way around this?). It would be nice to get the full date.

fawda123
  • 479
  • 3
  • 12

1 Answers1

0

If you use download.file you get an html representation of the directory which you can parse with the xml2 package.

read_ftp <- function(url)
{
  tmp <- tempfile()
  download.file(url, tmp, quiet = TRUE)
  html <- xml2::read_html(readChar(tmp, 1e6))
  file.remove(tmp)
  lines <- strsplit(xml2::xml_text(html), "[\n\r]+")[[1]]
  lines <- grep("(\\d{2}/){2}\\d{4}", lines, value = TRUE) 
  result <- read.table(text = lines, stringsAsFactors = FALSE)
  setNames(result, c("Date", "Time", "Size", "File"))    
}

Which allows you to just do this:

read_ftp("ftp://ftp.FreeBSD.org/pub/FreeBSD/")
#>         Date    Time      Size        File
#> 1 05/07/2015 12:00AM     4,259  README.TXT
#> 2 04/22/2020 08:00PM        35   TIMESTAMP
#> 3 04/22/2020 08:00PM Directory development
#> 4 04/22/2020 10:00AM     2,325   dir.sizes
#> 5 11/12/2017 12:00AM Directory         doc
#> 6 11/12/2017 12:00AM Directory       ports
#> 7 04/22/2020 08:00PM Directory    releases
#> 8 11/09/2018 12:00AM Directory   snapshots

Created on 2020-04-22 by the reprex package (v0.3.0)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • @fawda123 sorry - Please see my updated answer which should now give the correct result. – Allan Cameron Apr 22 '20 at 20:18
  • Just found out that this does not work in Unix environments. Normally this would not be a problem, but I'm running tests outside of Windows that are failing. – fawda123 Apr 23 '20 at 18:08
  • @fawda123 I can't test on a unix set-up. What contents do you get in the file saved by `download.file`? Or do you get an error from the server? – Allan Cameron Apr 23 '20 at 18:40
  • Content is downloaded okay, but the date formats are the same as in my original post. – fawda123 Apr 23 '20 at 18:49
  • @fawda123 yes, the response parsed by this function only seems to be delivered if winInet is used. I can try to dig through the curl docs to see if the server can be persuaded to return that response via libcurl. In the meantime, can you check what sort of response you get with `method = "wget"` in `download.file` - I think your unix system will have this available. – Allan Cameron Apr 23 '20 at 19:47
  • Okay, thanks, `method = "wget"` returns the full info on Linux. The code you gave above doesn't parse the date correctly, but it's just a matter of messing with the regex calls. – fawda123 Apr 23 '20 at 20:38
  • @fawda123 that's good to know. I think a libcurl solution would be the most portable option if you are using multiple set-ups. I'll try to improve the answer to reflect this if I can. – Allan Cameron Apr 23 '20 at 20:44