1

I'm programming in R and I have a dataset like this:

Date 
"mrt 2015"
"2012-06-22"
"2012 in Munchen"
"1998?"
"02-2012"
"02-01-1990"
..

How do I retrieve the four numeric values in a row (2015, 2012, 2012, 1998, ..)?

Ziezo
  • 59
  • 6
  • Possible duplicate of [how to extract the first 2 Characters in a string by a function in R?](http://stackoverflow.com/questions/38750535/how-to-extract-the-first-2-characters-in-a-string-by-a-function-in-r) – Sotos May 22 '17 at 09:37
  • I didn't see I posted the example dataset with structure.. The real dataset doesn't have any structure, I will edit my post now – Ziezo May 22 '17 at 09:46

2 Answers2

3

You just need to capture the group of 4 numbers anywhere in your string:

sub(".*(\\d{4}).*", "\\1", your_strings)
#[1] "2015" "2012" "2012" "1998" "2012" "1990"

Explanation: .* means anything 0 or more times, then you put what you want to capture in between bracket (so 4 digits: \\d{4}) then again, anything 0 or more times (.*).

Cath
  • 23,906
  • 5
  • 52
  • 86
1

We can use str_extract to get the numbers if they occur at the beginning of the string or else return NA

library(stringr)
as.integer(str_extract(df1$Date, "^\\d{4}"))
#[1] 2015 2012 2012 1998

Update

Based on the OP's edited dataset, if the 4 digit number occurs anywhere in the string, we remove the ^ which implies beginning of string and use only the pattern \\d{4} i.e. 4 digit number

as.integer(str_extract(df1$Date, "\\d{4}"))
#[1] 2015 2012 2012 1998 2012 1990

Note that this is very specific i.e. if there is an element that doesn't have the pattern, it returns NA

as.integer(str_extract(c('mrt 2015', 'mr5', '201-01', '02-01-1990', '2012'), '\\d{4}'))
#[1] 2015   NA   NA 1990 2012

Or a base R option is regmatches/regexpr

as.integer(regmatches(df1$Date, regexpr("\\d{4}", df1$Date)))
#[1] 2015 2012 2012 1998 2012 1990
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Oh I didn't see I posted the example dataset with structure.. The real dataset doesn't have any structure, I will edit my post now – Ziezo May 22 '17 at 09:44
  • 1
    @Ziezo If that is the case, remove the `^` I updated the post – akrun May 22 '17 at 10:33