1

I would like to extract first occurrence of specific words in a column. So basically, I have a column like this.

product
Jasjus Mangga & Diabet Sweetener
Krimer Thai Tea 20s & Jasjus Madu 
[FREE TUMBLER] Susu Platinum 
Buy 2 Get 1 Free Krimer Thai Tea

Basically I want to extract first occurence of several words like 'Jasjus', 'Krimer', 'Diabet' and 'Susu'. SO then, I could make one more column consists of those words.

product                              brand
Jasjus Mangga & Diabet Sweetener     Jasjus
Krimer Thai Tea 20s & Jasjus Madu    Krimer
[FREE TUMBLER] Susu Platinum         Susu
Buy 2 Get 1 Free Krimer Thai Tea     Krimer

I know how to extract the words which is located besides special characters like '/' and '&', but I didn't find the way how to extract the first occurrence word. Thanx so much before.

  • The question can be solved like this but it would still be kind to provide [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) data –  Jul 31 '19 at 08:16

2 Answers2

1

We could use str_extract after pasting the pattern together

words <- paste0(c('Jasjus', 'Krimer', 'Diabet' ,'Susu'), collapse = "|")
df$brand <- stringr::str_extract(df$product, words)

df
#                            product  brand
#1  Jasjus Mangga & Diabet Sweetener Jasjus
#2 Krimer Thai Tea 20s & Jasjus Madu Krimer
#3      [FREE TUMBLER] Susu Platinum   Susu
#4  Buy 2 Get 1 Free Krimer Thai Tea Krimer

data

df <- structure(list(product = structure(c(3L, 4L, 1L, 2L), 
.Label = c("[FREE TUMBLER] Susu Platinum", 
"Buy 2 Get 1 Free Krimer Thai Tea", "Jasjus Mangga & Diabet Sweetener", 
"Krimer Thai Tea 20s & Jasjus Madu"), class = "factor")), row.names = 
c(NA, -4L), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can use base R methods

df$brand <- regmatches(df$product, regexpr(paste(c('Jasjus', 'Krimer', 'Diabet' 
       ,'Susu'), collapse = "|"), df$product))
df$brand
#[1] "Jasjus" "Krimer" "Susu"   "Krimer"

data

df <- structure(list(product = structure(c(3L, 4L, 1L, 2L), 
.Label = c("[FREE TUMBLER] Susu Platinum", 
"Buy 2 Get 1 Free Krimer Thai Tea", "Jasjus Mangga & Diabet Sweetener", 
"Krimer Thai Tea 20s & Jasjus Madu"), class = "factor")), row.names = 
c(NA, -4L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662