Using your examples in the original question and comments:
library('taxize')
x <- c(gnr_resolve("Anguina tritici", data_source_ids=11, canonical=FALSE)$matched_name,
gnr_resolve("Contracaecum ogcocephali", canonical=FALSE)$matched_name)
# [1] "Anguina tritici (Steinbuch, 1799)" "Contracaecum ogcocephali"
# [3] "Contracaecum ogcocephali" "Contracaecum ogcocephali"
# [5] "Contracaecum ogcocephali" "Contracaecum ogcocephali"
# [7] "Contracaecum ogcocephali" "Contracaecum ogcocephali"
# [9] "Contracaecum ogcocephali" "Contracaecum ogcocephali"
# [11] "Contracaecum ogcocephali Olsen 1952" "Contracaecum ogcocephali Olsen 1952"
It looks like you can use a regex to extract the pattern "lastname followed by optional comma followed by 4-digit year"
gsub('(\\w+,?\\s+\\d{4})|.', '\\1', x)
# [1] "Steinbuch, 1799" "" "" "" ""
# [6] "" "" "" "" ""
# [11] "Olsen 1952" "Olsen 1952"
where (\\w+,?\\s+\\d{4})|.
says save to the first capture group (\\1
) a word character one or more times, \\w+
, followed by a comma (optional), ,?
followed by white space one or more times, \\s+
, followed by exactly four digits, \\d{4}