0

I'm trying to get all the "banana + word" ocurrences of a given object, but the str_extract returns only the first occurence. My code:

all_terms <- c("banana word2 word3 word4 banana split word2 word3 word4",
               "x y z",
               "banana ice cream")

banana_terms <- all_terms %>% 
  str_extract("banana.+") %>% 
  word(1,2)


banana_terms
Out: [1] "banana word2" NA             "banana ice"  

What I wanted:

Out: [1] "banana word2" "banana split", "banana ice" 
Maël
  • 45,206
  • 3
  • 29
  • 67
jvqp
  • 239
  • 3
  • 14

3 Answers3

4

Use str_extract_all and \\w+ to get the word after banana (and banana).

all_terms %>% 
  str_extract_all("banana.\\w+") %>% 
  unlist()

# [1] "banana word2" "banana split" "banana ice"

Without unlist, you get a list:

str_extract_all(all_terms, "banana.\\w+")

[[1]]
[1] "banana word2" "banana split"

[[2]]
character(0)

[[3]]
[1] "banana ice"
Maël
  • 45,206
  • 3
  • 29
  • 67
  • It works! Can you explain me what is the \\w ? Is it word? I could not find in my cheat sheet (https://spannbaueradam.shinyapps.io/r_regex_tester/) – jvqp Apr 19 '22 at 15:38
  • Basically yes, look [here](https://stackoverflow.com/questions/11874234/difference-between-w-and-b-regular-expression-meta-characters) maybe for more info. – Maël Apr 19 '22 at 15:41
2

In base R, we can use regmatches/gregexpr

unlist(regmatches(all_terms, gregexpr("banana\\s+\\S+", all_terms)))
[1] "banana word2" "banana split" "banana ice"  
akrun
  • 874,273
  • 37
  • 540
  • 662
1

If you want to use str_extract, you need to make sure each "banana word" is an individual element in a vector.

str_split is used to split every "empty space" + "banana" pattern (" (?=banana)") into individual element. Then use the regex (banana.\\w+) provided by @Maël in str_extract.

Finally, remove NA in the vector.

library(stringr)

all_banana <- str_extract(str_split(all_terms, " (?=banana)", simplify = T), "banana.\\w+")
all_banana <- all_banana[!is.na(all_banana)]

all_banana
[1] "banana word2" "banana ice"   "banana split"
benson23
  • 16,369
  • 9
  • 19
  • 38