dplyr: case_when involving many cases

Question

I have two data frames:

set.seed(002)
data1 <- data.frame(cbind(
  a1 = sample(letters, 8, replace = TRUE),
  a2 = rpois(8, 10)
), stringsAsFactors = FALSE)

data2 <- data.frame(cbind(
  b1 = paste("area", 1:6, sep = " "),
  b2 = c("e", "s", "o", "y", "d", "v")
), stringsAsFactors = FALSE)

data1
  a1 a2
1  e  9
2  s 10
3  o 12
4  e  9
5  y 16
6  y  9
7  d 11
8  v 13

data2
      b1 b2
1 area 1  e
2 area 2  s
3 area 3  o
4 area 4  y
5 area 5  d
6 area 6  v

I want to create a new column in data1 called a3 while matching a1 with information from data2 e.g if a1 = "e" then a3 = "area 1", if a1 = "d" then a3 = "area 5" and so on. The new data1 should look like this:

  a1 a2     a3
1  e  9 area 1
2  s 10 area 2
3  o 12 area 3
4  e  9 area 1
5  y 16 area 4
6  y  9 area 4
7  d 11 area 5
8  v 13 area 6

I can achieve this by doing

data1 %>%
  mutate(a3 = case_when(
    a1 == "e" ~ "area 1",
    a1 == "s" ~ "area 2",
    a1 == "o" ~ "area 3",
    a1 == "y" ~ "area 4",
    a1 == "d" ~ "area 5",
    TRUE ~ "area 6"
  ))

The problem is that I have many cases and I am to repeat this on a number of data frames with different cases.

I can do this with base r by writing

data1$a3 <- NA
for(i in 1:nrow(data2)){
  for(j in 1:nrow(data1)){
    if(data1[j,1] == data2[i,2]){
      data1[j,3] <- data2[i,1]
    }
  }
}

but I am a fun of dplyr. Any assistance on how to achieve this using dplyr is appreciated.

score 3 · Accepted Answer · answered Dec 13 '17 at 16:21

3

data1 <- dplyr::left_join(data1, data2, by = c("a1" = "b2"))

data1:-

a1 a2     b1
e  9   area 1
s 10   area 2
o 12   area 3
e  9   area 1
y 16   area 4
y  9   area 4
d 11   area 5
v 13   area 6

answered Dec 13 '17 at 16:21

sm925

2,648
1
16
28

What if you had multiple columns? So, instead of a1=b2, there would be a2=b3, a3 =b4, and so on? – Brian Holt Mar 03 '21 at 21:53

MrFlick · Answer 2 · 2017-12-13T16:53:29.993

Looks like you are really just doing a join in that case. You can do

left_join(data1, data2, by=c("a1"="b2"))
#   a1 a2     b1
# 1  e  9 area 1
# 2  s 10 area 2
# 3  o 12 area 3
# 4  e  9 area 1
# 5  y 16 area 4
# 6  y  9 area 4
# 7  d 11 area 5
# 8  v 13 area 6

And if you care about renaming the column or merging only one specific column from data2 when it has more, you could do

left_join(data1, data2 %>% select(a1=b2, a3=b1))

score 2 · Answer 3 · answered Dec 13 '17 at 16:16

This solution doesn't use dplyr::case_when(), but it does use another dplyr function. Instead of using the second table to write your case_when(), you can just join the table in and then rename the variable using dplyr::rename().

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
set.seed(2)
data1 <- data.frame(cbind(a1 = sample(letters, 8, replace = TRUE), a2 = rpois(8, 
  10)), stringsAsFactors = FALSE)

data2 <- data.frame(cbind(b1 = paste("area", 1:6, sep = " "), b2 = c("e", "s", 
  "o", "y", "d", "v")), stringsAsFactors = FALSE)

data1 %>% left_join(data2, by = c(a1 = "b2")) %>% rename(a3 = b1)
#>   a1 a2     a3
#> 1  e  9 area 1
#> 2  s 10 area 2
#> 3  o 12 area 3
#> 4  e  9 area 1
#> 5  y 16 area 4
#> 6  y  9 area 4
#> 7  d 11 area 5
#> 8  v 13 area 6

dplyr: case_when involving many cases

3 Answers3

Linked