I never used R but had deep experience with regexps.
Idiomatically proper way would be to use matching.
For R it should be regmatches:
Use regmatches to get the actual substrings matched by the regular
expression. As the first argument, pass the same input that you passed
to regexpr or gregexpr . As the second argument, pass the vector
returned by regexpr or gregexpr. If you pass the vector from regexpr
then regmatches returns a character vector with all the strings that
were matched. This vector may be shorter than the input vector if no
match was found in some of the elements. If you pass the vector from
regexpr then regmatches returns a vector with the same number of
elements as the input vector. Each element is a character vector with
all the matches of the corresponding element in the input vector, or
NULL if an element had no matches.
>x <- c("abc", "def", "cba a", "aa")
> m <- regexpr("a+", x, perl=TRUE)
> regmatches(x, m)
[1] "a" "a" "aa"
In you case it should be:
m <- regexpr("\d{4}", year1, perl=TRUE)
regmatches(year1, m)
In case if you can have another 4 digits in a row in the same string you can use non capturing groups. Probably like this:
"(?:_)\d{4}(?:_)"
Sorry, have no chance to test all this in R.