I am trying to re-code a large set of text data into either a text or numeric value.
My data set includes names of coffee shops. I would like to re-code these coffee shops into either "corporation" or "small business". The problem is there are variations in how these coffee shops are spelled (e.g., starbucks vs. starbcks, vs. starbucks coffee). I would like to create a code that scans the dataset for the word "star" and re-codes it into "corporation".
Example data:
customers <- data.table(customer_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
store = c("starbcks", "peets", "coffee bean", "drnk", "starbucks", "coffee ben", "coffee bean", "coffee bean", "drnk", "starbucks coffee"))
I would like to recode the "store" column into "type", which i would then factor and re-code into a numeric value.
customers <- data.table(customer_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
store = c("starbcks coffee", "portfolios", "coffee bean", "sharkhead", "starbucks", "coffee ben", "cuppa cuppa", "coffee bean", "drnk", "starbucks coffee"),
type = c("corporation", "small business", "corporation", "small business", "corporation", "corporation", "small business", "corporation", "corporation", "corporation"),
rc_type = c(1, 2, 1, 2, 1, 1, 2, 1, 1, 1))
I have looked into the stringr package and tried the standard way of re-coding, but to no avail. Any help is appreciate. Thank you!