Fomart date with date_parser

Question

I'm using R to extract data from multiple pdf files. After extracting this data, I need to format a field of type date. Dates came from pdf files in various formats


data = c( "1/10/2022 2:36:00 pm",
  "1/11/2022 12:47:00 pm",
  "1/12/2022 9:47:00 am",
  "1/13/2022 9:21:00 am",
  "1/14/2022 12:59:00 pm",
  "1/10/2022 2:39:00 pm",
  "1/11/2022 12:46:00 pm",
  "1/12/2022 9:48:00 am",
  "1/13/2022 9:22:00 am",
  "1/14/2022 1:00:00 pm",
  "1/10/2022 2:40:00 pm",
  "1/11/2022 12:45:00 pm",
  "1/12/2022 9:49:00 am",
  "1/13/2022 9:23:00 am",
  "1/14/2022 1:01:00 pm",
  "1/10/2022 2:42:00 pm",
  "1/11/2022 12:44:00 pm",
  "1/12/2022 9:50:00 am",
  "1/13/2022 9:24:00 am",
  "1/14/2022 1:02:00 pm",
  "1/10/2022 2:44:00 pm",
  "1/11/2022 12:43:00 pm",
  "1/12/2022 9:51:00 am",
  "1/13/2022 9:25:00 am",
  "1/14/2022 1:03:00 pm",
  "10/01/2022 14:36:00",
  "11/01/2022 12:47:00",
  "12/01/2022 09:47:00",
  "13/01/2022 09:21:00",
  "14/01/2022 12:59:00",
  "10/01/2022 14:39:00",
  "11/01/2022 12:46:00",
  "12/01/2022 09:48:00",
  "13/01/2022 09:22:00",
  "14/01/2022 13:00:00",
  "10/01/2022 14:40:00",
  "11/01/2022 12:45:00",
  "12/01/2022 09:49:00",
  "13/01/2022 09:23:00",
  "14/01/2022 13:01:00",
  "10/01/2022 14:42:00")
df <- data.frame(data)

I tried as follows


date_parser <- function(d) {
  if (endsWith(d, "m")) {
    as.POSIXct(d, format = "%m/%d/%Y %I:%M %p")
  } else {
    as.POSIXct(d, format = "%d/%m/%Y %H:%M")
  }
}

df = df%>%
  rowwise %>%
  mutate(data = date_parser(data))

I get this error message

Error in mutate:In argument: data_hora_leitura_coc08 = date_parser(data_hora_leitura_coc08).In row 1732.Caused by error in `if (endsWith(d, "m")) missing value where TRUE/FALSE needed

Also, some dates come out as "NA" How to solve it?

score 0 · Answer 1 · answered Aug 02 '23 at 19:27

0

Your sample data has seconds, so you need to add %S to each of your formats.

date_parser <- function(d) {
  if (endsWith(d, "m")) {
    as.POSIXct(d, format = "%m/%d/%Y %I:%M:%S %p")
  } else {
    as.POSIXct(d, format = "%d/%m/%Y %H:%M:%S")
  }
}

df %>%
  rowwise() %>%
  mutate(data = date_parser(data))
# # A tibble: 41 × 1
# # Rowwise: 
#    data               
#    <dttm>             
#  1 2022-01-10 14:36:00
#  2 2022-01-11 12:47:00
#  3 2022-01-12 09:47:00
#  4 2022-01-13 09:21:00
#  5 2022-01-14 12:59:00
#  6 2022-01-10 14:39:00
#  7 2022-01-11 12:46:00
#  8 2022-01-12 09:48:00
#  9 2022-01-13 09:22:00
# 10 2022-01-14 13:00:00
# # ℹ 31 more rows
# # ℹ Use `print(n = ...)` to see more rows

answered Aug 02 '23 at 19:27

r2evans

141,215
6
77
149

Some dates are still coming out in NA format. How do I share all date records with you? There are more than 6,000 lines – Francisco Ribeiro Aug 03 '23 at 10:56
Find the ones that are `NA` in the output and share some samples of the originals of those. (You may want to do `df %>% rowwise() %>% mutate(data2 = date_parser(data)) %>% filter(is.na(data2))` and take some samples from that. You may also want to look at https://stackoverflow.com/a/52319606/3358272 (dates), https://stackoverflow.com/a/70304571/3358272 (POSIXct) for ways to handle multiple "candidate formats" of your timestamps. – r2evans Aug 03 '23 at 14:00

score 0 · Answer 2 · answered Aug 02 '23 at 19:28

0

lubridate does exactly what you want. Try

df %>% 
   mutate(parsed_data = if_else(endsWith(data, "m"), mdy_hms(data), dmy_hms(data)))

answered Aug 02 '23 at 19:28

konvas

14,126
2
40
46

Andre Wildberg · Answer 3 · 2023-08-03T12:37:50.090

0

With parse_date_time

library(lubridate)

df %>% mutate(new = parse_date_time(data, 
  orders=c("mdY IMS p", "dmY HMS", "mdY IM p", "dmY HM")))
                    data                 new
1   1/10/2022 2:36:00 pm 2022-01-10 14:36:00
2  1/11/2022 12:47:00 pm 2022-01-11 12:47:00
3   1/12/2022 9:47:00 am 2022-01-12 09:47:00
4   1/13/2022 9:21:00 am 2022-01-13 09:21:00
5  1/14/2022 12:59:00 pm 2022-01-14 12:59:00
6    10/01/2022 14:36:00 2022-01-10 14:36:00
7    11/01/2022 12:47:00 2022-11-01 12:47:00
8    12/01/2022 09:47:00 2022-12-01 09:47:00
9    13/01/2022 09:21:00 2022-01-13 09:21:00
10   14/01/2022 12:59:00 2022-01-14 12:59:00
11    12/2/2022 10:16 am 2022-12-02 10:16:00
12   12/11/2022 12:28 pm 2022-12-11 12:28:00
13      07/12/2022 10:49 2022-07-12 10:49:00
14      2/1/2023 5:10 pm 2023-02-01 17:10:00
15      28/01/2023 10:39 2023-01-28 10:39:00
16      2/1/2023 9:58 am 2023-02-01 09:58:00
17      06/12/2022 09:20 2022-06-12 09:20:00
18      07/12/2022 10:08 2022-07-12 10:08:00
19      08/12/2022 07:48 2022-08-12 07:48:00

Data

df <- structure(list(data = c("1/10/2022 2:36:00 pm", "1/11/2022 12:47:00 pm", 
"1/12/2022 9:47:00 am", "1/13/2022 9:21:00 am", "1/14/2022 12:59:00 pm", 
"10/01/2022 14:36:00", "11/01/2022 12:47:00", "12/01/2022 09:47:00", 
"13/01/2022 09:21:00", "14/01/2022 12:59:00", "12/2/2022 10:16 am", 
"12/11/2022 12:28 pm", "07/12/2022 10:49", "2/1/2023 5:10 pm", 
"28/01/2023 10:39", "2/1/2023 9:58 am", "06/12/2022 09:20", "07/12/2022 10:08", 
"08/12/2022 07:48")), class = "data.frame", row.names = c(NA, 
-19L))

edited Aug 03 '23 at 12:37

answered Aug 02 '23 at 19:30

Andre Wildberg

12,344
3
12
29

Warning message: There was 1 warning in `mutate()`. ℹ In argument: `new = parse_date_time(...)`. Caused by warning: ! 2979 failed to parse. – Francisco Ribeiro Aug 03 '23 at 11:00
@FranciscoRibeiro ... means you have dates other than the ones you showed us. Try to find rows that deviate from the pattern. May even be that you have rows that can't be interpreted as a proper date. – Andre Wildberg Aug 03 '23 at 11:16
Some dates are still coming out in NA format. How do I share all date records with you? There are more than 6,000 lines – Francisco Ribeiro Aug 03 '23 at 11:27
@FranciscoRibeiro You can try to search for differing patterns, e.g. `df$data[!grepl(".*/.*/.* .*:.*:", df$data)]` and post those with `dput(head(df$data[!grepl(".*/.*/.* .*:.*:", df$data)], 10))` – Andre Wildberg Aug 03 '23 at 11:37
"12/2/2022 10:16 am", "12/11/2022 12:28 pm", "07/12/2022 10:49", "2/1/2023 5:10 pm", "28/01/2023 10:39", "2/1/2023 9:58 am" , "06/12/2022 09:20", "07/12/2022 10:08", "08/12/2022 07:48 – Francisco Ribeiro Aug 03 '23 at 11:50
@FranciscoRibeiro These are without seconds, should work now with edit. – Andre Wildberg Aug 03 '23 at 12:10
Separated in a dataframe, the times that contain am | pm. 10/1/2022 2:36:00 pm, 11/1/2022 12:47:00 pm, 12/1/2022 9:47:00 am This am|pm pattern in some cases I get NA return – Francisco Ribeiro Aug 03 '23 at 12:20
@FranciscoRibeiro These am/pm don't differ from the ones already present and should not pose a problem. – Andre Wildberg Aug 03 '23 at 12:48

Fomart date with date_parser

3 Answers3

Data