regex keep only last number before word R (reproducible)

Question

I have a column of dates, number+date, and NA in the following format, I'd like to only keep the date part, deleting the other number in R probably using sub or gsub? Happy to accept an answer that helps me out :)

df <- data.frame(a=c(1:11), datecol=c("11 June 2018", NA, NA, "400 10 June 2017",NA,"5 05 June 2018", NA, NA, NA, NA, "25 15 May 2016"))

df.desired <- data.frame(a=c(1:11), datecol=c("11 June 2018", NA, NA, "10 June 2017",NA,"05 June 2018", NA, NA, NA, NA, "15 May 2016"))

akrun · Accepted Answer · 2018-06-12T05:04:05.267

4

We can use sub to match the pattern of 1 or 2 digits (\\d{1,2}), followed by space, word (\\w+) for month, space and the last 4 digits representing 'year', capture as a group, and use the backreference for the captured group in the replacement

sub(".*\\s+(\\d{1,2}.*\\w+\\s+\\d{4}$)", "\\1", df$datecol)
#[1] "11 June 2018" NA             NA             "10 June 2017" NA            
#[6] "05 June 2018" NA             NA             NA             NA            
#[11] "15 May 2016"

edited Jun 12 '18 at 05:04

answered Jun 12 '18 at 05:01

akrun

874,273
37
540
662

2

Awesome as usual @akrun, thanks! – Neal Barsch Jun 12 '18 at 05:03

score 2 · Answer 2 · answered Jun 12 '18 at 05:05

You can also use, stringr package:

stringr::str_extract(df$datecol,"\\d{1,2}\\s+[a-zA-Z]+\\s+\\d{4}")

Output:

> stringr::str_extract(df$datecol,"\\d{1,2}\\s+[a-zA-Z]+\\s+\\d{4}")
 [1] "11 June 2018" NA             NA             "10 June 2017"
 [5] NA             "05 June 2018" NA             NA            
 [9] NA             NA             "15 May 2016"

regex keep only last number before word R (reproducible)

2 Answers2