I have an example project and need to search for strings using the stringr
package. In the example, to eliminate other case spellings I started with str_to_lower(example$remarks)
, which made the remarks all lower case. The remarks column describes residential properties.
I need to search for the word "shop". However, the word "shopping" is also in the remarks column and I don't want that word.
Some observations: a) Have only the word "shop"; b) Have only the word "shopping"; c) Have neither the words "shop" or "shopping"; d) Have BOTH the words "shop" & "shopping".
When using str_detect()
, I want it to give me a TRUE
for detecting the word "shop", but I DO NOT want it to give me a TRUE
for detecting the string "shop" within the word "shopping". Currently, if I run str_detect(example$remarks, "shop")
I get a TRUE
for both the words "shop" and "shopping". Effectively, I ONLY want a TRUE
for the 4-character string "shop" and if the characters "shop" appear but have any other characters after it like shop(ping), I want the code to exclude detecting it and not identifying it as TRUE
.
Also, if the remarks contain BOTH the words "shop" and "shopping", I would like the result to be TRUE
only for detecting "shop" but not "shopping".
Ultimately, I'm hoping one line of code using str_detect()
can give me the result of:
- If the remarks observation has only the word "shop" =
TRUE
- If the remarks observation has only the word "shopping" =
FALSE
- If the remarks observation has neither the words "shop" or "shopping" =
FALSE
- If the remarks observation has both the words "shop" AND "shopping" =
TRUE
for detecting ONLY the 4-character string "shop" and it DOES not output aTRUE
because of the word "shopping".
I need all of the observations to remain in the dataset and cannot exclude them because I need to create a new column, which I have labeled shop_YN
, that give a "Yes" for observations with only the 4-character string "shop". Once I have the correct str_detect()
code, I plan to wrap the results in a mutate()
and if_else()
function as follows (except I don't know what to code to use inside str_detect()
to get the results I need):
shop_YN <- example %>% mutate(shop_YN = if_else(str_detect(example$remarks, ), "Yes", "No"))
Here is a sample of the data using the dput()
:
structure(list(price = c(195000, 213000, 215000, 240000, 241000,
250000, 255000, 256500, 260000, 263500, 265000, 277000, 280000,
280000, 150000), remarks = c("large home with a 1200 sf shop. great location close to shopping.",
"updated home close to shopping & schools.", "nice location. 2br home with updating.",
"huge shop on property!", "close to shopping.", "updated, clean, great location, garage.",
"close to shopping and massive shop on property.", "updated home near shopping, schools, restaurants.",
"large home with updated interior.", "close to schools, updated, stick-built shop 1500sf.",
"home and shop.", "near schools, shopping, restaurants. partially updated home.",
"located close to shopping. high quality home with shop in backyard.",
"brick 2-story. lots of shopping near by. detached garage and large shop in backyard.",
"fixer! needs work.")), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))