-1

This isn't duplicate. I already referred these posts 1,2,3

I have a dataframe like below:

t_df <- data.frame("SN" = c("AabcC123","ABC123","ABC123","MNO098","MNO098","MNO098"), 
                      "code" = c("ABC1111","DEF222","GHI133","","MNO1123","MNO567"), 
                      "d_time" = c("21/12/2012 12:12","30/06/2019 11:11","20/02/2027 10:10","22/11/2327 09:09:11","5/2/1927 08:08:12",""))
t_df <- data.frame(lapply(t_df, as.character), stringsAsFactors=FALSE)

While I read the csv file, all my dates are of character type.

So when I try to sort the dataframe, it only works based on first characters.

I would like to convert this to Datetime format.

Though I tried the below options based on other posts that I linked above, nothing helped. It returns NA

as_datetime(t_df$d_time, "%d/%m/%Y %H:%M:%S")  

as.POSIXct(t_df$d_time,format="%d/%m/%Y %H:%M:%S")

dmy_hms(t_df$d_time)

Can you help me convert the datatype to datetime?

The Great
  • 7,215
  • 7
  • 40
  • 128

2 Answers2

2

An idea is to count the colons (:) and if there is only 1, then add seconds as :00, i.e

library(stringr) #for str_count()

t_df$d_time[t_df$d_time != ''] <- ifelse(str_count(t_df$d_time[t_df$d_time != ''], ':') < 2, 
                                               paste0(t_df$d_time[t_df$d_time != ''], ':00'), 
                                                               t_df$d_time[t_df$d_time != ''])

as.POSIXct(t_df$d_time, format = "%d/%m/%Y %H:%M:%S")
#[1] "2012-12-21 12:12:00 EET" "2019-06-30 11:11:00 +03" "2027-02-20 10:10:00 +03" "2327-11-22 09:09:11 +03" "1927-02-05 08:08:12 EET" NA
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    Thank you. UPVOTED for the effort and time. marked as answer – The Great Nov 01 '19 at 10:05
  • If I just stick to `H:M` irrespective of what is present in my data, I will just get zeroes for missing component. Is that how this works? – The Great Nov 01 '19 at 10:07
  • If you define the format with just hours:minutes, then If the date has seconds you will get NA. You must have 1 uniform format to be able to convert. Actually, I think there are packages with time functions that accept multiple formats – Sotos Nov 01 '19 at 10:10
  • One quick question, when I use like this, `as.POSIXct("2220-04-27 00:10",format="%Y-%m-%d %H:%M") and `as.Date("2220-04-27 00:10",format = "%Y-%m-%d")`, it works fine here. But in my real dataframe with million records, all `dates` have a difference of `1` day. For example, `date` value is 2220-04-26` and datetime is `2220-04-27 00:10`. This is happening for whole dataframe. can you let me know what can be the issue? – The Great Nov 01 '19 at 10:44
  • Does not sound right. I am not sure why but there are `time zone` and `origin` arguments in `as.POSIXct()` which might help you tackle that problem – Sotos Nov 01 '19 at 10:48
1

You have to adjust the format to the format in your data which is %d/%m/%Y. For example in the first row you have 21/12/2012 which is day/month/year. In R the short for day is %d the short for month is %m and the short for a four digit year is %Y. The slashes / in between refer to the slashes in your string.

Therefore

t_df <- data.frame("SN" = c("AabcC123","ABC123","ABC123","MNO098","MNO098","MNO098"), 
                   "code" = c("ABC1111","DEF222","GHI133","","MNO1123","MNO567"), 
                   "d_time" = c("21/12/2012","30/06/2019","20/02/2027","22/11/2327","5/2/1927",""),
                   stringsAsFactors = FALSE)

t_df$d_time <- as.POSIXct(t_df$d_time, format = "%d/%m/%Y")

Note that most of the time when creating a data.frame it is better to use the option stringsAsFactors = FALSE as I did.

Cettt
  • 11,460
  • 7
  • 35
  • 58