-2

I need to quickly extract dates from character vectors. I have 2 main issues:

  • Various date formats (European and American, alphanumeric and numeric...)
  • Multiple dates in each vector.

My vectors are something as follows:

c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00     ")

I have tried using parse_date and strptime without success. I do not know anything about the regex syntax and do not really have time to dig into it.

Warmly thank you for your help.

camille
  • 16,432
  • 18
  • 38
  • 60
Emma Maury
  • 11
  • 4
  • 4
    Please post the code that you've tried, along with expected inputs and outputs. You're much more likely to get help by showing you've attempted to solve this yourself. – miken32 Jan 06 '20 at 17:41
  • 2
    Also please format your example data for easy replication, see the dput function and the reprex package – Bruno Jan 06 '20 at 17:43
  • 3
    I don't want this to seem rude, but this is a question where you're almost certainly going to get regex-based answers. Saying you don't have time to dig into it comes off as wanting someone to write your code for you. Seeing what you've tried already would help explain the logic you're using. I get that regex is intimidating but there are some good resources for learning; I like [regex101](https://regex101.com/) a lot – camille Jan 06 '20 at 18:27
  • 1
    Yes stringr has an amazing cheat sheet as well http://edrub.in/CheatSheets/cheatSheetStringr.pdf – Bruno Jan 06 '20 at 18:46

2 Answers2

2

We can use str_extract_all to extract all the dates with a pattern of two digits followed by /, followed by two digits, / and then four digits

library(stringr)
str_extract_all(v1, "\\d{2}/\\d{2}/\\d{4}")[[1]]

data

v1 <-  c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00 ")
akrun
  • 874,273
  • 37
  • 540
  • 662
2

If you need R dates, you will need to choose if you value more American or European dates

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date


v1 <-  c("11/09/2016 Invoice Number . Date P.O. # Amount Discount Paid Amount 2017/015 10/28/2016 CC6/ $50,000.00 $0.00 $50,000-00 2017/016 10/28/2016 CC67 $50,000.00 $0.00 $50,000-00 2017-017 10/28/2016 CC67 $50,000.00 . $0.00 $50,000.00 TOTALS: $150,000.00 $0.00 $150,000.00")

str_extract_all(v1, "\\d{2}/\\d{2}/\\d{4}")[[1]] %>% 
  tibble(value = .) %>% 
  mutate(american_date = value %>% mdy,
         european_date = value %>% dmy,
         stronger_american = coalesce(american_date,european_date),
         stronger_european = coalesce(european_date,american_date))
#> Warning: 3 failed to parse.
#> # A tibble: 4 x 5
#>   value      american_date european_date stronger_american stronger_european
#>   <chr>      <date>        <date>        <date>            <date>           
#> 1 11/09/2016 2016-11-09    2016-09-11    2016-11-09        2016-09-11       
#> 2 10/28/2016 2016-10-28    NA            2016-10-28        2016-10-28       
#> 3 10/28/2016 2016-10-28    NA            2016-10-28        2016-10-28       
#> 4 10/28/2016 2016-10-28    NA            2016-10-28        2016-10-28

Created on 2020-01-06 by the reprex package (v0.3.0)

akrun
  • 874,273
  • 37
  • 540
  • 662
Bruno
  • 4,109
  • 1
  • 9
  • 27