1

I have the following data:

df <- data.frame(dt.str = c("X2019.12.31.23.59.5", "X2020.01.31.23.59.59", "X2020.02.29.23.59.59", "X30.04.2020.23.59", "X30.04.2020.23.59", "X30.06.2020.23.59"), value = c(4, 3.42,2.96,7.26,3.35,2.986))

I would like to convert dt.str to date.

df <- df %>% mutate (dt.str.2 = as.Date(str_extract(dt.str, "[^X]+$"), format= "%Y.%m.%d.%H.%M"))

However, this solution does not work (understandably) on the last 3 cases.

GrBa
  • 381
  • 1
  • 9

4 Answers4

2

A solution based on package lubridate:

library(lubridate)
df$dt.str.2 <- format(parse_date_time(sub("^X(.{10}).*$","\\1",df$dt.str), 
                                      orders = c("%Y.%m.%d", "%d.%m.%Y")), "%Y.%m.%d")

df
                dt.str value   dt.str.2
1  X2019.12.31.23.59.5 4.000 2019.12.31
2 X2020.01.31.23.59.59 3.420 2020.01.31
3 X2020.02.29.23.59.59 2.960 2020.02.29
4    X30.04.2020.23.59 7.260 2020.04.30
5    X30.04.2020.23.59 3.350 2020.04.30
6    X30.06.2020.23.59 2.986 2020.06.30

In dplyr:

library(lubridate)
library(dplyr)
df <- df %>% 
  mutate (dt.str.2 = format(parse_date_time(sub("^X(.{10}).*$","\\1",dt.str), 
                                            orders = c("%Y.%m.%d", "%d.%m.%Y")), "%Y.%m.%d"))
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
2

You can use parse_date_time from lubridate after removing "X" from start of the string.

library(lubridate)
as.Date(parse_date_time(sub('^X', '', df$dt.str), c('YmdHMS', 'dmyHM')))
#[1] "2019-12-31" "2020-01-31" "2020-02-29" "2020-04-30" "2020-04-30" "2020-06-30"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

You can try if_else:

df = df %>% mutate (dt.str.2 = if_else(
nchar(str_extract(string =df$dt.str,pattern = '(?<=X)\\d*(?=.)'))==4,
true = as.Date(str_extract(dt.str, "[^X]+$"), format= "%Y.%m.%d.%H.%M.%S"),
false =as.Date(str_extract(dt.str, "[^X]+$"), format= "%d.%m.%Y.%H.%M")))
                dt.str value   dt.str.2
1  X2019.12.31.23.59.5 4.000 2019-12-31
2 X2020.01.31.23.59.59 3.420 2020-01-31
3 X2020.02.29.23.59.59 2.960 2020-02-29
4    X30.04.2020.23.59 7.260 2020-04-30
5    X30.04.2020.23.59 3.350 2020-04-30
6    X30.06.2020.23.59 2.986 2020-06-30

If you have more than two date patterns in the data, you can keep on appending if_else() calls in the false parameter.

tushaR
  • 3,083
  • 1
  • 20
  • 33
0

We can use anytime from anytime

library(lubridate)
library(anytime)
library(dplyr)
library(stringr)
addFormats("%d.%m.%Y.%H.%M")
df %>% 
    mutate(dt.str.2 = as_date(anytime(str_remove(dt.str, "^X"))))
#                dt.str value   dt.str.2
#1  X2019.12.31.23.59.5 4.000 2019-12-31
#2 X2020.01.31.23.59.59 3.420 2020-01-31
#3 X2020.02.29.23.59.59 2.960 2020-02-29
#4    X30.04.2020.23.59 7.260 2020-04-30
#5    X30.04.2020.23.59 3.350 2020-04-30
#6    X30.06.2020.23.59 2.986 2020-06-30
akrun
  • 874,273
  • 37
  • 540
  • 662