2

I would like to format my date variable to %d %b %Y (e.g. 05 May 2020). However, once it has been formatted, it becomes a character variable and sorting the variable from the earliest date to the latest date would not be possible (e.g. 05 May 2020 is sorted before 26 Apr 2020).

Data:

df <- structure(list(Date = structure(c(1588204800, 1587945600, 1588464000, 1588032000,  
                                        1588291200, 1588377600, 1588118400), class = c("POSIXct", 
                                                                                       "POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA, -7L))
# > df
# Date
# 1 2020-04-30
# 2 2020-04-27
# 3 2020-05-03
# 4 2020-04-28
# 5 2020-05-01
# 6 2020-05-02
# 7 2020-04-29

Here is how it looks like sorting a formatted date variable:

df %>% 
  mutate(Date = format(Date, "%d %b %Y")) %>% 
  arrange(Date)
#          Date
# 1 01 May 2020
# 2 02 May 2020
# 3 03 May 2020
# 4 27 Apr 2020
# 5 28 Apr 2020
# 6 29 Apr 2020
# 7 30 Apr 2020

So, this is what I have done, which works, but I would like to know if this is really correct or if there are alternatives to solve this.

df %>% 
  mutate(Date = factor(Date, labels = format(sort(unique(Date)), "%d %b %Y"), ordered = TRUE)) %>% 
  arrange(Date)
#          Date
# 1 27 Apr 2020
# 2 28 Apr 2020
# 3 29 Apr 2020
# 4 30 Apr 2020
# 5 01 May 2020
# 6 02 May 2020
# 7 03 May 2020

Edit: Actually the reason behind wanting to format it and arranging it, is so that I can have direct access to more readable date formats when building my dashboard for my users.

When it comes to ggplot(), even after you do arrange and mutate with format, the facetted plots, will always give in sorted character order. Example below:

df %>% 
  arrange(Date) %>% 
  mutate(n = 1:n(),
         Date = format(Date, "%d %b %Y")) %>% 
  ggplot() +
  geom_bar(aes(x = n)) +
  facet_wrap(~Date)
HNSKD
  • 1,614
  • 2
  • 14
  • 25
  • The correct way would be to keep them as Date variable only and not use `format` if you want to use them ahead in your analysis. Once you use `format` it becomes character and can't do much about it. Another option is to have two columns, one with `format` and another with date. – Ronak Shah May 05 '20 at 04:40
  • @RonakShah The reason why I format it as such is because I would like to use it directly in the tables and plots when building my dashboard. I couldnt find a way to see whether the levels and labels are correctly mapped in my question above. – HNSKD May 05 '20 at 04:42
  • 1
    Why not chose the second option then? Keep two columns one with date to do processing and another one with format. – Ronak Shah May 05 '20 at 04:49
  • As an aside, date axes in base R and the ggplot2 package allow you to specify the date formats during the plotting. E.g. - https://stackoverflow.com/questions/24481176/r-x-axis-date-labels-using-plot and https://stackoverflow.com/questions/11748384/formatting-dates-on-x-axis-in-ggplot2 – thelatemail May 05 '20 at 04:56
  • @HNSKD I have provided a simple solution below, if it answers your question please make sure to accept the answer, if not please clarify what you want to achieve – rg255 May 05 '20 at 05:01

3 Answers3

2

My original solution is below, but the better solution is so simple it hurts a little that I didn't spot it immediately - do your arrange() before your mutate() - at that point it is a date-type variable so will sort the way you want it to:

df %>% 
  arrange(Date) %>% 
  mutate(Date = format(Date, "%d %b %Y"))

Giving:

         Date
1 27 Apr 2020
2 28 Apr 2020
3 29 Apr 2020
4 30 Apr 2020
5 01 May 2020
6 02 May 2020
7 03 May 2020

Alternatively, you could add an as.Date(..., format = "%d %b %Y") to your arrange():

df %>% 
  mutate(Date = format(Date, "%d %b %Y")) %>%
  arrange(as.Date(Date, format = "%d %b %Y"))
rg255
  • 4,119
  • 3
  • 22
  • 40
2

If you want to use dates in plots the main idea is to adjust the factor levels based on order in which you want to show data. arrange the dates first and attach factor levels based on occurrence of dates.

library(dplyr)
library(ggplot2)

df %>% 
  arrange(Date) %>% 
  mutate(n = row_number(),
         Date = format(Date, "%d %b %Y"), 
         Date = factor(Date, levels = unique(Date))) %>% 
  ggplot() + geom_bar(aes(x = n)) + facet_wrap(~Date)

enter image description here

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Personally, I prefer the tidyverse solution for dates - lubridate. Here:

library(lubridate)

df %>% 
  mutate(Date = ymd(Date)) %>% 
  arrange(Date)

In short, you can parse your dates by combining d for day, m for month and y for year. You can add time, too. For example,

ymd_hms("20150102 12:23:01")

As the example shows we do not have to bother about the seperator. If you have access this is a nice paper on that package. Otherwise, there are many tutorials out there on lubridate.