0

I am trying to make a dummy variable for product without also grabbing compound words that include product. In the output above it counts "productdesign" as 1 in tags$product. I am trying to find the syntax to strictly grab "product".

library(NLP)
library(tm)
library(tidytext)
library(tidyverse)
library(topicmodels)
library(dplyr)
library(stringr)
library(purrr)
library(tidyr)
#sample dataset
tags <- c("product productdesign electronicdevice", "productdesign electronicdevice")
web <- c("hardware", "sunglasses")
tags <- data_frame(tags, web)
tags <- mutate(tags, product = ifelse(grepl("product", tags), 1, 0))
Brandon Minnick
  • 13,342
  • 15
  • 65
  • 123
Kreitz Gigs
  • 369
  • 1
  • 9
  • Take a look at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 -- can you reduce your sample code to just what we need to help with this problem (that is, pull the libraries we don't need) and provide a sample of the output you're expecting? – Amanda Feb 21 '18 at 23:31

1 Answers1

0

You can include \\b to indicate the boundary of the word or \\sproduct\\s to include the space before and after the word product

> tags <- mutate(tags, product = ifelse(grepl("\\bproduct\\b", tags), 1, 0))
> tags
# A tibble: 2 x 3
                                    tags        web product
                                   <chr>      <chr>   <dbl>
1 product productdesign electronicdevice   hardware       1
2         productdesign electronicdevice sunglasses       0
> 
Onyambu
  • 67,392
  • 3
  • 24
  • 53