Create new column in dataframe based on other column in R

Question

The data I used look like this:

data
Subject    Cluster
A          1
B          1
C          2
D          3
E          2
F          1
G          3
H          3
I          4
J          4
K          5
L          6
M          7
N          5
O          3

Based on column cluster, I want to make new column called Verdict that contain note if the subject is passed, need remedial, or failed.

If the subjects is in:

*Cluster 1 or 3, they failed

*Cluster 2 or 5, they need remedial

*Cluster 4 or 6 or 7, they passed

And the final data will look like this

data
subject    cluster    verdict
A          1          Failed
B          1          Failed
C          2          Remedial
D          3          Failed
E          2          Remedial
F          1          Failed
G          3          Failed
H          3          Failed
I          4          Passed
J          4          Passed
K          5          Remedial
L          6          Passed
M          7          Passed
N          5          Remedial
O          3          Failed

I already tried using simple code like:

data$verdict = 
  ifelse(data$cluster == 1|data$cluster == 3,'Failed',
         ifelse(data$cluster == 2|data$cluster == 5,'Remedial','Passed'))

And it worked. But I feel it's not efficient especially if I have large number of cluster and/or verdict. Is there more efficient way to do this?

What is `data%cluster`? Do you mean `data$cluster`? – camille Jun 14 '22 at 04:15 — camille, Jun 14 '22 at 04:15

score 2 · Accepted Answer · edited Jun 14 '22 at 05:36

2

Instead of using == try %in% and try using case_when.

data %>%
  mutate(verdict = case_when(
    cluster %in% c(1,3) ~ "Failed",
    cluster %in% c(2,5) ~ "Remedial",
    TRUE ~ "Passed"
  ))

edited Jun 14 '22 at 05:36

Darren Tsai

32,117
5
21
51

answered Jun 14 '22 at 04:09

Park

14,771
6
10
29

Create new column in dataframe based on other column in R

1 Answers1