Need help replacing character strings found in a column with a value in R

Question

I'm still very unfamiliar with data wrangling in R so bear with me. I have a data frame with a "brand" column, with over 1400 rows and about 200 unique brands. I've organized the brands into 3 different categories and put them in a list for the time being (similar to the sample below, but each list having many more brands).

I am trying to replace the brands in the brand column with integers, but I want the brands each list to have the same integer. I can reorganize what kind of structure the brands are in if necessary. My problem comes from having R identify the individual brands within the list. I've been toying with the ifelse() function, but can't quite get %in% to do what I want. Thanks!

A <- list("Brand A", "Brand B", "Brand C")
M <- list("Brand M", "Brand N", "Brand O")
Z <- list("Brand X", "Brand Y", "Brand Z")

Please make your question reproducible by pasting a sample of your data: use `dput(head(your_data_sample, n))` where `n` is enough data to demonstrate the problem. This makes it easier for others to test and verify solutions. You can use the https://reprex.tidyverse.org/articles/articles/magic-reprex.html and https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html packages to assist you with that. See also https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5 & https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example? — Peter, Apr 06 '21 at 08:19

score 2 · Answer 1 · answered Apr 05 '21 at 22:25

Here is one approach:

library(tidyverse)

df <- tibble(ID = 1:200,
                 brand = sample(c("Brand A", "Brand B", "Brand C",
                                  "Brand M", "Brand N", "Brand O",
                                  "Brand X", "Brand Y", "Brand Z"),
                                size = 200, replace = TRUE))
df
# A tibble: 200 x 2
#      ID brand  
#   <int> <chr>  
# 1     1 Brand X
# 2     2 Brand M
# 3     3 Brand A
# 4     4 Brand N
# 5     5 Brand M
# 6     6 Brand B
# 7     7 Brand X
# 8     8 Brand B
# 9     9 Brand N
# 10    10 Brand O
# … with 190 more rows


df_grouped <- df %>% 
  mutate(brand_group = case_when(brand %in% c("Brand A", "Brand B", "Brand C") ~ "A",
                                 brand %in% c("Brand M", "Brand N", "Brand O") ~ "M",
                                 brand %in% c("Brand X", "Brand Y", "Brand Z") ~ "Z"))

df_grouped
# A tibble: 200 x 3
#      ID brand   brand_group
#   <int> <chr>   <chr>      
# 1     1 Brand X Z          
# 2     2 Brand M M          
# 3     3 Brand A A          
# 4     4 Brand N M          
# 5     5 Brand M M          
# 6     6 Brand B A          
# 7     7 Brand X Z          
# 8     8 Brand B A          
# 9     9 Brand N M          
# 10    10 Brand O M          
# … with 190 more rows


df_int <- df_grouped %>% 
  mutate(brand_int = ifelse(brand_group == "A", 1, ifelse(brand_group == "M", 2, 3)))

df_int
# A tibble: 200 x 4
#      ID brand   brand_group brand_int
#   <int> <chr>   <chr>           <dbl>
# 1     1 Brand X Z                   3
# 2     2 Brand M M                   2
# 3     3 Brand A A                   1
# 4     4 Brand N M                   2
# 5     5 Brand M M                   2
# 6     6 Brand B A                   1
# 7     7 Brand X Z                   3
# 8     8 Brand B A                   1
# 9     9 Brand N M                   2
# 10    10 Brand O M                   2
# … with 190 more rows

Edit

You can also use case_when() to change the original brand to an integer in a single step, e.g.

df_int <- df %>% 
  mutate(brand_int = case_when(brand %in% c("Brand A", "Brand B", "Brand C") ~ 1,
                                 brand %in% c("Brand M", "Brand N", "Brand O") ~ 2,
                                 brand %in% c("Brand X", "Brand Y", "Brand Z") ~ 3))

df_int
# A tibble: 200 x 3
#      ID brand   brand_int
#   <int> <chr>       <dbl>
# 1     1 Brand X         3
# 2     2 Brand M         2
# 3     3 Brand A         1
# 4     4 Brand N         2
# 5     5 Brand M         2
# 6     6 Brand B         1
# 7     7 Brand X         3
# 8     8 Brand B         1
# 9     9 Brand N         2
# 10    10 Brand O         2
# … with 190 more rows

Or, if you want to replace the brand with the integer in the same column, you can use transmute() e.g.

df_int <- df %>% 
  transmute(brand_int = case_when(brand %in% c("Brand A", "Brand B", "Brand C") ~ 1,
                                 brand %in% c("Brand M", "Brand N", "Brand O") ~ 2,
                                 brand %in% c("Brand X", "Brand Y", "Brand Z") ~ 3))

df_int
# A tibble: 200 x 1
#   brand_int
#       <dbl>
# 1         3
# 2         2
# 3         1
# 4         2
# 5         2
# 6         1
# 7         3
# 8         1
# 9         2
# 10         2
# … with 190 more rows

score 2 · Answer 2 · answered Apr 05 '21 at 22:47

You can try merge + stack like below

merge(df,
  stack(list(A = A, M = M, Z = Z)),
  by.x = "brand",
  by.y = "values",
  all.x = TRUE
)

which gives

     brand ID ind
1  Brand A  8   A
2  Brand A 10   A
3  Brand A 12   A
4  Brand B  2   A
5  Brand B  9   A
6  Brand C 13   A
7  Brand C 19   A
8  Brand M 17   M
9  Brand M 18   M
10 Brand M 15   M
11 Brand N 11   M
12 Brand N 16   M
13 Brand O  6   M
14 Brand O  7   M
15 Brand O  5   M
16 Brand O 14   M
17 Brand X  1   Z
18 Brand X 20   Z
19 Brand Y  3   Z
20 Brand Z  4   Z

Data

> dput(df)
structure(list(ID = 1:20, brand = c("Brand X", "Brand B", "Brand Y", 
"Brand Z", "Brand O", "Brand O", "Brand O", "Brand A", "Brand B",
"Brand A", "Brand N", "Brand A", "Brand C", "Brand O", "Brand M",
"Brand N", "Brand M", "Brand M", "Brand C", "Brand X")), class = "data.frame", row.names = c(NA,
-20L))

score 1 · Answer 3 · answered Apr 05 '21 at 22:20

1

Is this what you wanted:

A <- list("Brand A", "Brand B", "Brand C")
M <- list("Brand M", "Brand N", "Brand O")
Z <- list("Brand X", "Brand Y", "Brand Z")
set.seed(123)
v <- sample(unlist(c(A, M, Z)), 1000, TRUE)
vn <- ifelse( v %in% A, 1, ifelse(v %in% M, 2, 3))
table(vn)
vn
  1   2   3
325 302 373

answered Apr 05 '21 at 22:20

David Z

6,641
11
50
101

That did exactly what I wanted it to, thank you! I had the "v" and "A" switched around in my code I believe, and but that did the trick. – William Rose Apr 05 '21 at 23:48

score 0 · Answer 4 · answered Apr 06 '21 at 02:35

Another approach using a lookup table.

set.seed(25)
brands <- c("Brand A" = "A", "Brand B" = "A",  
      "Brand C" = "A", "Brand M" = "M", "Brand N"= "N", 
      "Brand O" = "N", "Brand X" = "X", "Brand Y" = "X", 
      "Brand Z" = "X")
v <- sample(unlist(c("Brand A", "Brand M", "Brand Z")), 1000, TRUE)
v_new <- brands[v]
table(v_new)

 v_new  
  A   M   X   
 334 308 358

Need help replacing character strings found in a column with a value in R

4 Answers4

Edit