0

I am working with a large list of dataframes that use inconsistent date formats. I would like to conditionally mutate across the list so that any dataframe that contains a string will use one date format, and those that do not contain the string use another format. In other words, I want to distinguish between dataframes launched in year 2019 (which use mdy) and those launched in all others years (which use dmy).

The following code will conditionally mutate rows within a dataframe, but I am unsure how to conditionally mutate across the entire column.

dataframes %>% map(~.x %>% 
    mutate(date_time = if_else(str_detect(date_time, "/19 "), 
                               mdy_hms(date_time), dmy_hms(date_time)))

Thank you!

edit

Data and code example. There are dataframes that contain a mixture of years.

library(tidyverse)
library(lubridate)

dataframes <- list(
  tibble(date_time = c("07/06/19 01:00:00 PM", "07/06/20 01:00:00 PM"), num = 1:2), # July 6th
  tibble(date_time = c("06/07/20 01:00:00 PM", "06/07/21 01:00:00 PM"), num = 1:2)  # July 6th 
)

dataframes %>% 
  map(~.x %>% 
        mutate(date_time = if_else(str_detect(date_time, "/19 "), 
                                   mdy_hms(date_time), dmy_hms(date_time)),
               date = date(date_time),
               month = month(date_time),
               doy = yday(date_time)))
                   

[[1]]
# A tibble: 2 × 5
  date_time             num date       month   doy
  <dttm>              <int> <date>     <dbl> <dbl>
1 2019-07-06 13:00:00     1 2019-07-06     7   187
2 2020-06-07 13:00:00     2 2020-06-07     6   159

[[2]]
# A tibble: 2 × 5
  date_time             num date       month   doy
  <dttm>              <int> <date>     <dbl> <dbl>
1 2020-07-06 13:00:00     1 2020-07-06     7   188
2 2021-07-06 13:00:00     2 2021-07-06     7   187
J Fitz
  • 13
  • 2
  • Can you please share a snippet of the data to create a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? Also can you clarify if there are any `data.frame`s that contain a mixture of data from 2019 and any other year or are they all each from a single calendar year? – Dan Adams Jul 20 '22 at 00:22
  • I updated my question to include a data example and clarified the data structure. – J Fitz Jul 20 '22 at 15:06
  • Your output seems to match what you asked for. The first table has July 6, 2019 and June 7, 2020 right? What is your desired output? What do you mean by "mutate across the entire column"? – Dan Adams Jul 20 '22 at 17:25
  • Each date in the list is July 6th across each year. The output is reading the 2nd date in the 1st table as June 7th (dmy) rather than July 6th (mdy). As a work around, I've made a new TRUE/FALSE column in each dataframe if date_time contains the string "/19 " and mutated the date format based on the T/F column. – J Fitz Jul 20 '22 at 17:58
  • If they are all supposed to be July 6th, then `"07/06/20"` is also in `mdy` format. However you're only telling it to parse the `/19` dates as `mdy` so that one is getting parsed as `dmy` and thus appears as June 7th. I still think your code is doing what you asked it to. – Dan Adams Jul 21 '22 at 00:15
  • Oh... do you mean if the **first** date is in `/19` then **all** the dates are in `mdy` format? – Dan Adams Jul 21 '22 at 00:18

1 Answers1

0

If you are trying to determine the format of the date column for the whole data.frame based on the presence of any date from 2019, then a small tweak of your code should work.

Instead of evaluating each record for the presence of /19 , you set the condition of the if_else() to be any(str_detect(...)) which returns TRUE if any of the values are TRUE. However the result of any() is always of length 1 so you then need to rep() the result to match the length of the whole data.frame using dplyr::n().

library(tidyverse)
library(lubridate)

dataframes <- list(
  tibble(date_time = c("07/06/19 01:00:00 PM", "07/06/20 01:00:00 PM"), num = 1:2), # July 6th
  tibble(date_time = c("06/07/20 01:00:00 PM", "06/07/21 01:00:00 PM"), num = 1:2)  # July 6th 
)

dataframes %>%
  map( ~ .x %>%
         mutate(
           date_time = if_else(str_detect(date_time, "/19 ") %>% 
                                 any() %>% 
                                 rep(n()), 
                               mdy_hms(date_time), 
                               dmy_hms(date_time)),
           date = date(date_time),
           month = month(date_time),
           doy = yday(date_time)
         ))
#> [[1]]
#> # A tibble: 2 × 5
#>   date_time             num date       month   doy
#>   <dttm>              <int> <date>     <dbl> <dbl>
#> 1 2019-07-06 13:00:00     1 2019-07-06     7   187
#> 2 2020-07-06 13:00:00     2 2020-07-06     7   188
#> 
#> [[2]]
#> # A tibble: 2 × 5
#>   date_time             num date       month   doy
#>   <dttm>              <int> <date>     <dbl> <dbl>
#> 1 2020-07-06 13:00:00     1 2020-07-06     7   188
#> 2 2021-07-06 13:00:00     2 2021-07-06     7   187

Created on 2022-07-20 by the reprex package (v2.0.1)

Dan Adams
  • 4,971
  • 9
  • 28