1

I have a column of dates, number+date, and NA in the following format, I'd like to only keep the date part, deleting the other number in R probably using sub or gsub? Happy to accept an answer that helps me out :)

df <- data.frame(a=c(1:11), datecol=c("11 June 2018", NA, NA, "400 10 June 2017",NA,"5 05 June 2018", NA, NA, NA, NA, "25 15 May 2016"))

df.desired <- data.frame(a=c(1:11), datecol=c("11 June 2018", NA, NA, "10 June 2017",NA,"05 June 2018", NA, NA, NA, NA, "15 May 2016"))
Neal Barsch
  • 2,810
  • 2
  • 13
  • 39

2 Answers2

4

We can use sub to match the pattern of 1 or 2 digits (\\d{1,2}), followed by space, word (\\w+) for month, space and the last 4 digits representing 'year', capture as a group, and use the backreference for the captured group in the replacement

sub(".*\\s+(\\d{1,2}.*\\w+\\s+\\d{4}$)", "\\1", df$datecol)
#[1] "11 June 2018" NA             NA             "10 June 2017" NA            
#[6] "05 June 2018" NA             NA             NA             NA            
#[11] "15 May 2016" 
akrun
  • 874,273
  • 37
  • 540
  • 662
2

You can also use, stringr package:

stringr::str_extract(df$datecol,"\\d{1,2}\\s+[a-zA-Z]+\\s+\\d{4}")

Output:

> stringr::str_extract(df$datecol,"\\d{1,2}\\s+[a-zA-Z]+\\s+\\d{4}")
 [1] "11 June 2018" NA             NA             "10 June 2017"
 [5] NA             "05 June 2018" NA             NA            
 [9] NA             NA             "15 May 2016" 
PKumar
  • 10,971
  • 6
  • 37
  • 52