0

I'm trying to create a new column in an R dataframe based on a set of conditions that are mutually exclusive. There is a clever way to achieve this on python using np.select(conditions, choices), instead of np.where (See this solved question). I've been looking for an equivalent on R that allows me to avoid writing a gigantic nested ifelse (which is the equivalent of np.where) without any success.

The amount of conditions that I have can change and I'm implementing a function for this. Therefore, and equivalent could be really helpful. Is there any option to do this? I'm new in R and come from python.

Thank you!

neilfws
  • 32,751
  • 5
  • 50
  • 63
dsqubitdo
  • 1
  • 1

2 Answers2

1

Yes, you can use case_when in R:

library(dplyr)
mtcars%>%
  mutate(cyl2=case_when(cyl>7~"High",
                        cyl==6~"Medium",
                        TRUE~"Low"))

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb   cyl2
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 Medium
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 Medium
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1    Low
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 Medium
5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2   High
one
  • 3,121
  • 1
  • 4
  • 24
  • What does `T` here mean? I guess it functions as the "else" or "default" condition, but why `T`? – tdy Jan 23 '23 at 22:00
  • 1
    that is equivalent to default in ```np.select```. ```T``` is short for ```TRUE``` in R. – one Jan 23 '23 at 22:02
  • 3
    I'd avoid using `T` for `TRUE`. It can create confusion, as we see here :) – neilfws Jan 23 '23 at 22:39
  • Thank you! Is there any way to pass the conditions and choices using a list? Similar to `cut()` (The solution proposed by @margusl below) – dsqubitdo Jan 23 '23 at 23:02
0

There's also cut(), Convert Numeric to Factor, with or without your own labels:

df <- data.frame(a = 1:10)

df$b <- cut(df$a, 
            breaks = c(-Inf,3,7,Inf), 
            labels = c("lo", "med", "hi"))

df$c <- cut(df$a, 
            breaks = c(-Inf,3,7,Inf))

df
#>     a   b        c
#> 1   1  lo (-Inf,3]
#> 2   2  lo (-Inf,3]
#> 3   3  lo (-Inf,3]
#> 4   4 med    (3,7]
#> 5   5 med    (3,7]
#> 6   6 med    (3,7]
#> 7   7 med    (3,7]
#> 8   8  hi (7, Inf]
#> 9   9  hi (7, Inf]
#> 10 10  hi (7, Inf]
margusl
  • 7,804
  • 2
  • 16
  • 20
  • This is amazing! Even if it's not the exact equivalent to np.select(), it's exactly what I was looking for. Thanks! – dsqubitdo Jan 23 '23 at 22:59