1

I'm using gnr_resolve in taxize (v. 0.7.0) to find the taxonomic authority (author and date) for a list of species. By setting canonical=FALSE I can get the record including the author and date, but is there a way to return just the taxonomic authority?

gnr_resolve("Anguina tritici", data_source_ids=11, canonical=FALSE)  
submitted_name                      matched_name      data_source_title score  
1  Anguina tritici Anguina tritici (Steinbuch, 1799) GBIF Backbone Taxonomy 0.988

So in this case I would only want (Steinbuch, 1799).

Jessica Beyer
  • 162
  • 3
  • 17
  • the return value has an object `$matched_name` which is just a character string `"Anguina tritici (Steinbuch, 1799)"`, can't you just [sub out the first part up to `(`](http://stackoverflow.com/questions/14790253/character-extraction-from-string) or [extract between the parentheses](http://stackoverflow.com/questions/8613237/extract-info-inside-all-parenthesis-in-r). additionally `gnr_resolve("Anguina tritici", data_source_ids=11, canonical=FALSE, fields = 'all')$matched_name` is completely different, that might be a bug – rawr Mar 21 '16 at 21:52
  • Not all of the records have parentheses around the author and date. For example `gnr_resolve("Contracaecum ogcocephali",canonical=FALSE)$matched_name` returns `Contracaecum ogcocephali Olsen 1952` @rawr – Jessica Beyer Mar 21 '16 at 22:32
  • guessing the pattern from those two examples, `x <- c('Anguina tritici (Steinbuch, 1799)', 'Contracaecum ogcocephali Olsen 1952'); gsub('(\\w+,?\\s+\\d{4})|.', '\\1', x)` returns the last name and year without the other stuff – rawr Mar 21 '16 at 22:43
  • @rawr That works perfectly. Can you post as an answer so I can accept it? – Jessica Beyer Mar 21 '16 at 23:06

1 Answers1

0

Using your examples in the original question and comments:

library('taxize')
x <- c(gnr_resolve("Anguina tritici", data_source_ids=11, canonical=FALSE)$matched_name,
       gnr_resolve("Contracaecum ogcocephali", canonical=FALSE)$matched_name)

# [1] "Anguina tritici (Steinbuch, 1799)"   "Contracaecum ogcocephali"           
# [3] "Contracaecum ogcocephali"            "Contracaecum ogcocephali"           
# [5] "Contracaecum ogcocephali"            "Contracaecum ogcocephali"           
# [7] "Contracaecum ogcocephali"            "Contracaecum ogcocephali"           
# [9] "Contracaecum ogcocephali"            "Contracaecum ogcocephali"           
# [11] "Contracaecum ogcocephali Olsen 1952" "Contracaecum ogcocephali Olsen 1952"

It looks like you can use a regex to extract the pattern "lastname followed by optional comma followed by 4-digit year"

gsub('(\\w+,?\\s+\\d{4})|.', '\\1', x)

# [1] "Steinbuch, 1799" ""                ""                ""                ""               
# [6] ""                ""                ""                ""                ""               
# [11] "Olsen 1952"      "Olsen 1952"   

where (\\w+,?\\s+\\d{4})|. says save to the first capture group (\\1) a word character one or more times, \\w+, followed by a comma (optional), ,? followed by white space one or more times, \\s+, followed by exactly four digits, \\d{4}

rawr
  • 20,481
  • 4
  • 44
  • 78