-1

Working on two datasets where the date columns is in character and has the date format dd-mm-yyyy hh:mm and the other one has the date format yyyy-mm-dd hh:mm:ss. I am trying to use the as.date() for changing the second one to the first format of dd-mm-yyyy hh:mm

But those columns are turning into N/A rows.

Eg:

m09_2020$started_at <- as_datetime(m09_2020$started_at, format = "%d-%m-%Y %H:%M") 

also i dont understand why the date format is getting changed after being read to R. The date format in the data set in the excel is dd-mm-yyyy hh:mm:ss but after getting read to r its changing to yyyy-mm-dd hh:mm:ss for those months. Thats weird. what should i do?

Sarthak Dev
  • 13
  • 1
  • 7
  • 4
    If I had to guess some of the dates are malformed in the source csv files, but hard to know since you didn't provide a minimal dataset for us to test. – petrucci4prez Aug 14 '21 at 17:08
  • This will be difficult to test or suggest fixes to without it being more *reproducible*. Here are a few good links providing suggestions for how to do that: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Please read those and then come back and [edit] your question. Thanks! – r2evans Aug 14 '21 at 17:58
  • As far as I can tell none of the rows would return TRUE for `which(total_trips$started_at == "2020-10-01")`. I do not see any conversion to a true Date value for any column and all to the values appear to be character and not actual R datetimes. Voting to close. – IRTFM Aug 14 '21 at 20:44
  • @IRTFM it is showing values when i use it for the month of aug,june and july. Thats what im trying to say that apart from those three months im not seeing any other months data using the above mentioned code – Sarthak Dev Aug 15 '21 at 05:58
  • @r2evans alright thanks ill look into it and do the needful, thanks! – Sarthak Dev Aug 15 '21 at 05:59
  • @r2evans i have provided the data set link you can use "202102 and 202006" and combine and check if the month of january is coming or not – Sarthak Dev Aug 15 '21 at 06:07
  • @r2evans I understand that the above comment might not be able to answer my issue specifically, so i uploaded my code in git. Can you please check it out and let me know where is it going wrong? [link](https://github.com/Davesarthak787/Cyclistic-Bike-Share/blob/main/Cyclistic%20Case%20Study.Rmd) – Sarthak Dev Aug 15 '21 at 17:47
  • It might be helpful to see the output from `sessionInfo()` (or `sessioninfo::session_info()`). – r2evans Aug 15 '21 at 18:21
  • @r2evans what do you mean? Should i attach the output here too? – Sarthak Dev Aug 16 '21 at 02:44
  • Yes. It might be helpful *for us* to see the output from `sessionInfo()` (or `sessioninfo::session_info()`). There may be other things in play here, such as `factor`s or something else. If you have an old-enough version of R, for instance, then the default could be `read.csv(.., stringsAsFactors=TRUE)` which might contribute to the problem. I don't know. I haven't had time to go through your entire (not minimal) code, so I'm trying to find small things. – r2evans Aug 16 '21 at 12:12
  • @r2evans hi uploaded the session info in git, feel free to check it out! https://github.com/Davesarthak787/Cyclistic-Bike-Share/blob/main/Session%20info.txt – Sarthak Dev Aug 17 '21 at 03:39
  • @r2evans Okay! i finally got it what is happening! So months July,August 2020 and August 2021 have the date format dd-mm-yyyy hh:mm and the others have the date format yyyy-mm-dd hh:mm:ss when im looking them up individually using the head(). Eg head(m10_2021) & head(m06_2020) so i have to convert the remaining months date format to that of the former one. So how should i go about that? – Sarthak Dev Aug 17 '21 at 05:06
  • @r2evans i am trying to use the as.date() for changing those dataframes values to the ones of the other ones but those columns are turning into N/A rows. m09_2020$started_at <- as_datetime(m09_2020$started_at, format = "%d-%m-%Y %H:%M") what should i do? – Sarthak Dev Aug 17 '21 at 05:20
  • @r2evans also i dont understand why the date format is getting changed after being read to R. The date format in the data set in the excel is dd-mm-yyyy hh:mm:ss but after getting read to r its changing to yyyy-mm-dd hh:mm:ss for those months. Thats weird – Sarthak Dev Aug 17 '21 at 14:16
  • R's format for the `Date` class is `yyyy-mm-dd`, period. If you want it to look differently, then (in general) your only recourse is to format it as a string ... but then it is no longer a `Date`, it's a string that looks like a date. Similarly, `POSIXct` and timestamps, there is one format for display. In general, the default formats are most-significant first, so year, month, day, hour, minute, second, regardless of the locale. (The locale comes into play when parsing strings into `POSIXt`/`Date`, that's about it.) – r2evans Aug 17 '21 at 15:52

2 Answers2

0

I would use ymd_hms and ymd_hm:

library(lubridate)
date1 <- "2019-01-01 22:05:00"
date2 <- "2019-01-01 22:05"
ymd_hms(date1)==ymd_hm(date2)
[1] TRUE

ymd_hms(date1)
[1] "2019-01-01 22:05:00 UTC"

As you can see I just provided a minimal example that people can reproduce on their side. Please do so in your next questions.

Cheers

gaut
  • 5,771
  • 1
  • 14
  • 45
  • Hey thanks for the suggestion but the issue is i want to change a date format of yyyy-mm-dd hh:mm:ss to dd-mm-yyyy hh:mm what you have shown keeps the date in the same format. Also while trying to dmy_hm(date) im getting an error Warning: All formats failed to parse. No formats found So any other thing i can do? – Sarthak Dev Aug 17 '21 at 14:13
  • Please give me an example of the exact dates you have – gaut Aug 17 '21 at 21:51
0

Background

When working with dates and datetimes in R, the lubridate package is your friend. From what I have seen, the YYYY-MM-DD is often preferred over DD-MM-YYYY (or worse MM-DD-YYYY) and is often the default format for datetimes. It makes sense, as this way the datetime components are arranged based on hierachy, and multiple values can be sorted based on numerical values only, which can be handy if working with data without actual date/datetime formatting (eg. in character format, or if working in other programs).

So, I think the reason why the date-formatted value looks different between MS Excel and R is because once R detects/is told it is a date format, it likely defaults to YYYY-MM-DD appearance. Excel however, in my experience is notorious for misleading appearances of dates and times, sometimes unexplainably. Even though Excel displays the date as DD-MM-YYYY (or whatever you set), it may not actually be stored in this format. This may become obvious if you look at the .csv version in a text editor, not Excel which tries to be smart and change the value's appearance, without telling you. The bottom line is: if you've got a problem and Excel is involved, don't trust Excel.

Importing dates and datetimes in R

Now, onto your question.

  • So, we have decided that YYYY-MM-DD is the preferred way to display dates and datetimes, and often the default in R and lubridate.
  • If you have date/datetime data in R, it is best to make sure R knows this and stores the values in datetime, rather than character format. This is especially useful if you want to manipulate, modify or visualise the data.

First I load the lubridate package, then make some sample values written out in YYYY-MM-DD HH:MM:SS and DD-MM-YYYY HH:MM:SS formats.

library(lubridate)

tibble(
  date1 = "2019-01-01 22:05:00", 
  date2 = "02-01-2019 16:45:00"
) %>% 
  {. ->> my_dates}

# # A tibble: 1 x 2
# date1               date2              
# <chr>               <chr>              
# 2019-01-01 22:05:00 02-01-2019 16:45:00

Note, both datetimes are in character format, as seen by the <chr> under the column names.

To convert these into datetime format, we use the lubridate functions of ymd_hms() and friends, or as_datetime(). As seen on ?as_datetime and ?ymd_hms, these functions take a datetime in character or numeric format, and convert them to datetime format - they are not for modifying the way an existing datetime-formatted datetime is displayed (much the same as the important difference between setting the CRS and re-projecting spatial data).

ymd_hms() takes a character-format datetime, which has to be in the year-month-day hour-minute-second format, and converts it to datetime format. If the datetime character is not in that format, you must use a different function which matches the character's format, such as dmy_hms(), mdy_hm() etc. So, let's use the appropriate functions to convert each of these characters to datetime format:

my_dates %>% 
  mutate(
    date1 = ymd_hms(date1), 
    date2 = dmy_hms(date2)
  ) %>% 
  {. ->> my_dates_dttm}

my_dates_dttm

# # A tibble: 1 x 2
# date1               date2              
# <dttm>              <dttm>             
# 2019-01-01 22:05:00 2019-01-02 16:45:00

Alternatively, we can use as_datetime() and specify the exact format the datetime is in using symbols and notation outlined here. In this example, %d means the day number, %m means the month number, %Y means the year number (capital Y means 4-digit version, lowercase is 2-digit), %H is the hour number, %M is the minute number (note capitalisaiton - different to month number), and %S is the seconds.

my_dates %>% 
  mutate(
    date1 = as_datetime(date1, format = '%Y-%m-%d %H:%M:%S'),
    date2 = as_datetime(date2, format = '%d-%m-%Y %H:%M:%S')
  ) %>% 
  {. ->> my_dates_dttm}

# # A tibble: 1 x 2
# date1               date2              
# <dttm>              <dttm>             
# 2019-01-01 22:05:00 2019-01-02 16:45:00

These functions cannot be used to try and convert the way datetimes are displayed; if you're lucky you will get NA, and if you're unlucky you will get incorrectly formatted data. This is the source of common errors such as All formats failed to parse. No formats found. This often occurs when you use dmy() instead of ymd(), or specified the format = part of as_datetime() wrong; because you have told R that a year value is a day value, eg the number 2019 can't be a day (max 31) or month (max 12). For example, if we incorrectly use ymd_hms() instead of dmy_hms() on date2 we get an error message, and the datetime-formatted value is NA:

my_dates %>% 
  mutate(
    date1 = ymd_hms(date1),
    date2 = ymd_hms(date2)
  )

# # A tibble: 1 x 2
# date1               date2 
# <dttm>              <dttm>
# 2019-01-01 22:05:00 NA    
# Warning message:
# Problem with `mutate()` column `date2`.
# i `date2 = ymd_hms(date2)`.
# i All formats failed to parse. No formats found.

Or worse, you confuse day and month and R doesn't realise because both numbers are less than 12 and are acceptable values for month or day. There is no warning or error message, and R is happy. This will cause massive headaches when manipulating datetimes or calculating time differences etc.

my_dates %>% 
  mutate(
    date1 = ymd_hms(date1),
    date2 = mdy_hms(date2)
  )

# # A tibble: 1 x 2
# date1               date2              
# <dttm>              <dttm>             
# 2019-01-01 22:05:00 2019-02-01 16:45:00

Alright, enough doom and gloom and back to our sample data before and after formatting.

my_dates
# # A tibble: 1 x 2
# date1               date2              
# <chr>               <chr>              
# 2019-01-01 22:05:00 02-01-2019 16:45:00

my_dates_dttm
# # A tibble: 1 x 2
# date1               date2              
# <dttm>              <dttm>             
# 2019-01-01 22:05:00 2019-01-02 16:45:00

So, there are 2 differences between my_dates and my_dates_dttm.

  1. Both columns are now in <dttm> format, instead of <chr> (written under column names)
  2. Because both columns are now in R's datetime format, they have defaulted to the YYYY-MM-DD HH:MM:SS display style. I would recommend leaving them like this, and continuing on with your calculations/manipulations.

Displaying dates and datetimes in a different format

However, if you would like to display datetimes in a different format, you can do so using the format() function. You must specify how you would like them printed using the same notation we used in as_datetime() above.

Please note, this returns the datetimes in character format, so this isn't much good for any calculation or manipulation, rather it should just be used as a label.

my_dates_dttm %>% 
  mutate(
    date1_dmy = format(date1, format = '%d-%m-%Y %H:%M:%S')
  )

# # A tibble: 1 x 3
# date1               date2               date1_dmy          
# <dttm>              <dttm>              <chr>              
# 2019-01-01 22:05:00 2019-01-02 16:45:00 01-01-2019 22:05:00

Because format() returns a character, your chosen format does not need to be a standard datetime format. For example, here we just jumble different components of the datetime:

my_dates_dttm %>% 
  mutate(
    date1_jumble = format(date1, format = '%S/%b-%H:%y-%d.%M')
  ) %>% 
  {. ->> my_dates_jumbled}

my_dates_jumbled

# # A tibble: 1 x 3
# date1               date2               date1_jumble      
# <dttm>              <dttm>              <chr>             
# 2019-01-01 22:05:00 2019-01-02 16:45:00 00/Jan-22:19-01.05

However, this does not preclude us from converting back to datetime format - if we know the exact format it is printed in.

my_dates_jumbled %>% 
  mutate(
    date_1_unjumble = as_datetime(date1_jumble, format = '%S/%b-%H:%y-%d.%M')
  )

# # A tibble: 1 x 4
# date1               date2               date1_jumble       date_1_unjumble    
# <dttm>              <dttm>              <chr>              <dttm>             
# 2019-01-01 22:05:00 2019-01-02 16:45:00 00/Jan-22:19-01.05 2019-01-01 22:05:00

Further info

The R for Data Science book is a great resource for learning R, and I would strongly recommend reading and working through the examples in Chapter 16 - Dates and times if you will be using this type of data.

hugh-allan
  • 1,170
  • 5
  • 16