Extract last word in a string after comma if there are multiple words else the first word

Question

I have data where the words as follows

 location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
 id<- c(1,2,3)
 df<-data.frame(location,id)

I would like to extract the country name from the data. The tricky part is if i extract just the last word then I will have only one record (France).

library(stringr)
df$country<- word(df$location,-1)

Any ideas on how to extract country data from this data?

 id  location                      country
  1   xyz, sss, New Zealand        New Zealand
  2   USA                          USA
  3   Pris,France                  France

akrun · Accepted Answer · 2015-06-30T21:58:57.860

12

You can try sub

 df$country <- sub('.*,\\s*', '', df$location)
 df$country
 #[1] "New Zealand" "USA"         "France"

Or

 library(stringr)
 str_extract(df$location, '\\b[^,]+$')
 #[1] "New Zealand" "USA"         "France"

edited Jun 30 '15 at 21:58

answered Jun 30 '15 at 21:28

akrun

874,273
37
540
662

7

`explanation [sub]:` from `df$location`, replace any character `.`, occuring any number of times `*`, upto a comma, followed by any number/type of whitespace `\\s` with nothing `''` `explanation [str_extract]:` from `df$location`, provide 1 or multiple `+` whole words `\\b`, not `[ ]` in a string that ends in a comma `^,` until the end of the string `$`. (so basically, provide all whole words after comma) – Richard Jul 30 '17 at 06:20

score 1 · Answer 2 · answered Jan 16 '18 at 15:55

stringi solution:

require(stringi)
location<- c("xyz, sss, New Zealand", "USA", "Pris,France")
stri_trim(stri_match_first_regex(location, "(^|,)([^,]*?)$")[,3])
## [1] "New Zealand" "USA"         "France"

stri_trim removes unnecessary spaces before/after country name.

Extract last word in a string after comma if there are multiple words else the first word

2 Answers2

Linked