1

I have a CSV with many rows and columns. Column D is for date of download, e.g. May 1 2014, and Column E is for date of publication, e.g. Jan 3 2014.

How can I first open and read all the dates in my CSV, then subtract the date in Column E from Column D, and write the number of days in a new column, say Column F, in a new or the same CSV?

Desired output:

May 1 2014    Jan 3 2014    90 (or whatever the number of days is)

I've done a search and I've learned how to convert the strings to dates, but I don't know how to subtract the data and print the result to a separate column.

Anyone who can help?

Community
  • 1
  • 1
Isak
  • 535
  • 3
  • 6
  • 17
  • 1
    `as.Date("May 1 2014", format="%b %d %Y") - as.Date("Jan 3 2014", format="%b %d %Y")` gives `Time difference of 118 days` when printed and the value is `## [1] 118` numerically – hrbrmstr Aug 31 '15 at 18:54

2 Answers2

3

hrbrmstr's comment showed up as I was writing effectively the same answer in long form.

DFrame <- data.frame(date_download = "May 1 2014",
                     date_publish = "Jan 3 2014",
                     stringsAsFactors=FALSE)

DFrame$date_download <- as.POSIXct(DFrame$date_download,
                                   format = "%b %d %Y")
DFrame$date_publish <- as.POSIXct(DFrame$date_publish,
                                  format = "%b %d %Y")

DFrame$diff <- difftime(DFrame$date_download, 
                        DFrame$date_publish, 
                        units="days")

DFrame

write.csv(DFrame, [filename], ...)

You can either use the - operator, or the difftime function to get the same result.

See ?write.csv for additional options to control what is written to your file.

Benjamin
  • 16,897
  • 6
  • 45
  • 65
  • Thanks, this is exactly what I'm looking for, but how do I write this to a column in the same or in a new CSV? – Isak Aug 31 '15 at 18:58
  • You'll have to do it with a new csv using `write.csv`. – Benjamin Aug 31 '15 at 18:58
  • Is there any chance you can include this in the code you pasted above? I don't know how to use write.csv in R. :/ – Isak Aug 31 '15 at 19:00
  • 2
    `write.csv` is fairly straight forward. I added the basics, but I'll encourage you to read the documentation to work out exactly how you want to file to be written. – Benjamin Aug 31 '15 at 19:02
  • Thanks a bunch! Could you have a look at the updated question, I clarified that I also need to open and read the dates from a CSV. – Isak Aug 31 '15 at 19:02
  • 1
    Interestingly enough, `as.Date` is _significantly_ more efficient than `as.POSIXct` (tho that prbly isn't a need of the OP) _but_ the `difftime` for `POSIXct` objects is marginally faster than for `Date` objects. – hrbrmstr Aug 31 '15 at 19:06
  • 1
    so you're saying my decision process on whether to use `Date` or `POSIX` classes should be dictated by more than just "POSIX is cool?" – Benjamin Aug 31 '15 at 19:12
1

THIS IS NOT AN ANSWER but a follow-up from my comment on benchmarking, since this may be a consideration in a larger-scale data situation.

library(microbenchmark)
library(ggplot2)

autoplot(microbenchmark(as_date=as.Date("May 1 2014", format="%b %d %Y"),
                        as_posixct=as.POSIXct("May 1 2014", format="%b %d %Y"), 
                        as_posixlt=strptime("May 1 2014", format="%b %d %Y"), 
                        times=100000))

enter image description here

d1_d <- as.Date("May 1 2014", format="%b %d %Y")
d2_d <- as.Date("Jan 3 2014", format="%b %d %Y")

d1_p1 <- as.POSIXct("May 1 2014", format="%b %d %Y")
d2_p1 <- as.POSIXct("Jan 3 2014", format="%b %d %Y")

d1_p2 <- strptime("May 1 2014", format="%b %d %Y")
d2_p2 <- strptime("Jan 3 2014", format="%b %d %Y")

autoplot(microbenchmark(date_difftime=difftime(d2_d, d1_d),
                        date_subtract=d2_d-d1_d,
                        posixct_difftime=difftime(d2_p1, d1_p1), 
                        posixct_subtract=d2_p1-d1_p1,
                        posixlt_difftime=difftime(d2_p2, d1_p2), 
                        posixlt_subtract=d2_p2-d1_p2,
                        times=10000))

enter image description here

It would seem Date objects may be an overall best choice (if working strictly with dates and not timestamps) and difftime wins against simple subtraction.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205