Background
When working with dates and datetimes in R, the lubridate
package is your friend. From what I have seen, the YYYY-MM-DD is often preferred over DD-MM-YYYY (or worse MM-DD-YYYY) and is often the default format for datetimes. It makes sense, as this way the datetime components are arranged based on hierachy, and multiple values can be sorted based on numerical values only, which can be handy if working with data without actual date/datetime formatting (eg. in character format, or if working in other programs).
So, I think the reason why the date-formatted value looks different between MS Excel and R is because once R detects/is told it is a date format, it likely defaults to YYYY-MM-DD appearance. Excel however, in my experience is notorious for misleading appearances of dates and times, sometimes unexplainably. Even though Excel displays the date as DD-MM-YYYY (or whatever you set), it may not actually be stored in this format. This may become obvious if you look at the .csv version in a text editor, not Excel which tries to be smart and change the value's appearance, without telling you. The bottom line is: if you've got a problem and Excel is involved, don't trust Excel.
Importing dates and datetimes in R
Now, onto your question.
- So, we have decided that YYYY-MM-DD is the preferred way to display dates and datetimes, and often the default in R and
lubridate
.
- If you have date/datetime data in R, it is best to make sure R knows this and stores the values in datetime, rather than character format. This is especially useful if you want to manipulate, modify or visualise the data.
First I load the lubridate
package, then make some sample values written out in YYYY-MM-DD HH:MM:SS and DD-MM-YYYY HH:MM:SS formats.
library(lubridate)
tibble(
date1 = "2019-01-01 22:05:00",
date2 = "02-01-2019 16:45:00"
) %>%
{. ->> my_dates}
# # A tibble: 1 x 2
# date1 date2
# <chr> <chr>
# 2019-01-01 22:05:00 02-01-2019 16:45:00
Note, both datetimes are in character format, as seen by the <chr>
under the column names.
To convert these into datetime format, we use the lubridate
functions of ymd_hms()
and friends, or as_datetime()
. As seen on ?as_datetime
and ?ymd_hms
, these functions take a datetime in character or numeric format, and convert them to datetime format - they are not for modifying the way an existing datetime-formatted datetime is displayed (much the same as the important difference between setting the CRS and re-projecting spatial data).
ymd_hms()
takes a character-format datetime, which has to be in the year-month-day hour-minute-second format, and converts it to datetime format. If the datetime character is not in that format, you must use a different function which matches the character's format, such as dmy_hms()
, mdy_hm()
etc. So, let's use the appropriate functions to convert each of these characters to datetime format:
my_dates %>%
mutate(
date1 = ymd_hms(date1),
date2 = dmy_hms(date2)
) %>%
{. ->> my_dates_dttm}
my_dates_dttm
# # A tibble: 1 x 2
# date1 date2
# <dttm> <dttm>
# 2019-01-01 22:05:00 2019-01-02 16:45:00
Alternatively, we can use as_datetime()
and specify the exact format the datetime is in using symbols and notation outlined here. In this example, %d
means the day number, %m
means the month number, %Y
means the year number (capital Y
means 4-digit version, lowercase is 2-digit), %H
is the hour number, %M
is the minute number (note capitalisaiton - different to month number), and %S
is the seconds.
my_dates %>%
mutate(
date1 = as_datetime(date1, format = '%Y-%m-%d %H:%M:%S'),
date2 = as_datetime(date2, format = '%d-%m-%Y %H:%M:%S')
) %>%
{. ->> my_dates_dttm}
# # A tibble: 1 x 2
# date1 date2
# <dttm> <dttm>
# 2019-01-01 22:05:00 2019-01-02 16:45:00
These functions cannot be used to try and convert the way datetimes are displayed; if you're lucky you will get NA
, and if you're unlucky you will get incorrectly formatted data. This is the source of common errors such as All formats failed to parse. No formats found.
This often occurs when you use dmy()
instead of ymd()
, or specified the format =
part of as_datetime()
wrong; because you have told R that a year value is a day value, eg the number 2019
can't be a day (max 31) or month (max 12). For example, if we incorrectly use ymd_hms()
instead of dmy_hms()
on date2
we get an error message, and the datetime-formatted value is NA
:
my_dates %>%
mutate(
date1 = ymd_hms(date1),
date2 = ymd_hms(date2)
)
# # A tibble: 1 x 2
# date1 date2
# <dttm> <dttm>
# 2019-01-01 22:05:00 NA
# Warning message:
# Problem with `mutate()` column `date2`.
# i `date2 = ymd_hms(date2)`.
# i All formats failed to parse. No formats found.
Or worse, you confuse day and month and R doesn't realise because both numbers are less than 12 and are acceptable values for month or day. There is no warning or error message, and R is happy. This will cause massive headaches when manipulating datetimes or calculating time differences etc.
my_dates %>%
mutate(
date1 = ymd_hms(date1),
date2 = mdy_hms(date2)
)
# # A tibble: 1 x 2
# date1 date2
# <dttm> <dttm>
# 2019-01-01 22:05:00 2019-02-01 16:45:00
Alright, enough doom and gloom and back to our sample data before and after formatting.
my_dates
# # A tibble: 1 x 2
# date1 date2
# <chr> <chr>
# 2019-01-01 22:05:00 02-01-2019 16:45:00
my_dates_dttm
# # A tibble: 1 x 2
# date1 date2
# <dttm> <dttm>
# 2019-01-01 22:05:00 2019-01-02 16:45:00
So, there are 2 differences between my_dates
and my_dates_dttm
.
- Both columns are now in
<dttm>
format, instead of <chr>
(written under column names)
- Because both columns are now in R's datetime format, they have defaulted to the YYYY-MM-DD HH:MM:SS display style. I would recommend leaving them like this, and continuing on with your calculations/manipulations.
Displaying dates and datetimes in a different format
However, if you would like to display datetimes in a different format, you can do so using the format()
function. You must specify how you would like them printed using the same notation we used in as_datetime()
above.
Please note, this returns the datetimes in character format, so this isn't much good for any calculation or manipulation, rather it should just be used as a label.
my_dates_dttm %>%
mutate(
date1_dmy = format(date1, format = '%d-%m-%Y %H:%M:%S')
)
# # A tibble: 1 x 3
# date1 date2 date1_dmy
# <dttm> <dttm> <chr>
# 2019-01-01 22:05:00 2019-01-02 16:45:00 01-01-2019 22:05:00
Because format()
returns a character, your chosen format does not need to be a standard datetime format. For example, here we just jumble different components of the datetime:
my_dates_dttm %>%
mutate(
date1_jumble = format(date1, format = '%S/%b-%H:%y-%d.%M')
) %>%
{. ->> my_dates_jumbled}
my_dates_jumbled
# # A tibble: 1 x 3
# date1 date2 date1_jumble
# <dttm> <dttm> <chr>
# 2019-01-01 22:05:00 2019-01-02 16:45:00 00/Jan-22:19-01.05
However, this does not preclude us from converting back to datetime format - if we know the exact format it is printed in.
my_dates_jumbled %>%
mutate(
date_1_unjumble = as_datetime(date1_jumble, format = '%S/%b-%H:%y-%d.%M')
)
# # A tibble: 1 x 4
# date1 date2 date1_jumble date_1_unjumble
# <dttm> <dttm> <chr> <dttm>
# 2019-01-01 22:05:00 2019-01-02 16:45:00 00/Jan-22:19-01.05 2019-01-01 22:05:00
Further info
The R for Data Science book is a great resource for learning R, and I would strongly recommend reading and working through the examples in Chapter 16 - Dates and times if you will be using this type of data.