In R, dates (proper Date
-class) are number-like, so that one can safely make continuous-number comparisons such as date1 < date2
and date2 >= date3
, etc. However, if you accidentally compare a %Y-%m-%d
-string with another similary-formatted string, then it will still work. It still works because strings are compared lexicographically. This means that when comparing strings "2020-01-01"
and "2019-01-01"
, it will first compare "2"
and "2"
, it's a tie; same with "0"
s; then it will see that "2"
> "1"
, and therefore "2019-01-01"
comes before the other.
This still works, even as strings, because the components with the most-significance are years, and as long as they are first in the string, the relative ordering (>
, sort
, order
) still works. This continues to work if the dates are 0
-padded integers. This does not work if they are not 0
-padded, where "2021-2-1" > "2021-11-1"
is reported as TRUE
; this is because it gets to the month portion and compares the "2"
with the first "1"
of "11"
, and does not see that the next digit makes the "1"
greater than "2"
.
The moment one starts bringing in month names, this goes the same type of wrong, since the month names (in any language, perhaps?) are not ordered lexicographically (I don't know that this is an absolute truth, but it is certainly true in English and perhaps many/most western languages ... I'm not polyglot to speak for other languages). This means that "2020-Apr-01" < "2020-Jan-01"
will again be TRUE
, unfortunately.
We'll combine #3 with the fact that in general, R will always print a Date
-class object as "%Y-%m-%d"
; there is no (trivial) way to get it to print a Date
-class object as your "%b %d %Y"
without either (a) converting it to a string and losing proper ordering; or (b) super-classing it so that it presents like you want on the console, but it is still a number underneath.
As for (a), this is a common thing to do for reports and labeling in plots, and I'm perfectly fine with that. I am not trying to convince the world that it should always see a date as %Y-%m-%d
. However, what I am saying is that it is much easier to keep it as a proper Date
-class object until you actually render it, and then format
it at the last second. For this, do all of your filtering and ordering and then print(format(..))
, such as this. I recommend this method.
dates <- seq(as.Date("2020-02-02"), as.Date("2020-02-06"), by = "day")
dates <- rev(dates[ dates > as.Date("2020-02-03") ])
print(format(dates, format = "%b %d %Y"))
# [1] "Feb 06 2020" "Feb 05 2020" "Feb 04 2020"
Again, above is the technique I recommend.
As for (b), yes, you can do it, but this approach is fragile since it is feasible that some functions that want Date
-class objects will not immediately recognize that these are close enough to continue working as such; or they will strip the new class we assign at which point it will resort to "%Y-%m-%d"
-format. You can use this, which requires that you change the class (see the # important
line) of every Date
-object you want to personalize the formatting. I recommend against doing this.
format.myDATE <- function(x, ...) { # fashioned after format.Date
xx <- format.Date(x, format = "%b %d %Y")
names(xx) <- names(x)
xx
}
print.myDATE <- function(x, max = NULL, ...) { # fashioned after print.Date
if (is.null(max))
max <- getOption("max.print", 9999L)
if (max < length(x)) {
print(format.myDATE(x[seq_len(max)]), ...)
cat(" [ reached 'max' / getOption(\"max.print\") -- omitted",
length(x) - max, "entries ]\n")
} else if (length(x))
print(format.myDATE(x), ...)
else cat(class(x)[1L], "of length 0\n")
invisible(x)
}
dates <- seq(as.Date("2020-02-02"), as.Date("2020-02-06"), by = "day")
class(dates) <- c("myDATE", class(dates)) ## important!
dates <- rev(dates[ dates > as.Date("2020-02-03") ])
print(dates) ## no need for format!
# [1] "Feb 06 2020" "Feb 05 2020" "Feb 04 2020"
### and number-like operations still tend to work
diff(dates)
# Time differences in days
# [1] -1 -1
Again, I recommend against doing this for data that you are working with. Many packages that pretty-print tables and plots and such may choose to override our preference for formatting, so there is no guarantee that this is honored across the board. This is why I suggest "accepting" the R way while working with it, regardless of your locale, and formatting it for your aesthetic preferences immediately before printing/rendering.