2

I'd like to use an if else statement to make a new column in my dataframe based on data in another column. I've looked at a number of prior (such as this one and this one), but seem to be doing something wrong as I either get an error or no new column.

I've tried making an ifelse function:

  if(x >= 4000)
{print (">4000")
  } else if (x >=3000 & x <= 4000) 
    {print ("3000-4000")
    } else if  (x >=2000 & x <= 3000) 
    {print("2000-3000")
      } else if (x >=1000 & x <= 2000)
      {print("1000-2000")
      } else print ("<1000")}

this function works/ran but I can't figure out how to apply it to one column in my dataframe (I've tried this dat$P.bins <- Bins(dat$Pcol) but get the following error: the condition has length > 1 and only the first element will be used1 ">4000"

I've also tried to run a ifelse statement:

dat$P.bin<- ifelse(P.col>=4000, ">4000",
                                ifelse(P.col <=4000 & >= 3000, "3000-4000"),
                                ifelse(P.col<=3000 & >= 2000, "2000-3000"), 
                                ifelse(P.col <=2000 & >=1000, "1000-2000"), 
                                ifelse(P.col <1000, "1000"))

but get this error: Error: unexpected '>=' in:"dat$P.bins <- ifelse(Pcol >=4000, ">4000",felse(Pcol <=4000 & >=". With this statement I'm not sure how to do a range in the ifelse statement.

Any help or guidance would be greatly appreciated!

clions226
  • 81
  • 1
  • 9
  • just wrapp your code inside a function definition. As in `my_function<-function(x) { if (x>=4000) {">4000} else if .......`. Then just call `my_function(yourdataframe$yourcolumn)` – GuedesBF Jun 11 '21 at 22:51
  • When I tried that and am running the code above I get these errors: Error in `if (x == ">4000") { : argument is of length zero` and ` the condition has length > 1 and only the first element will be used[1] ">4000" ` How do I fix this? I'm not sure how the argument is of length zero and unsure why the print argument isn't working correctly. – clions226 Jun 12 '21 at 21:11
  • When I look at my dataframe after running the code and getting the errors, it has added a new column, but all of the values are >4000 (did not put values from P.col into bins in new column). When I try removing that line from my function I still get the same error but with the next value down. Can you only enter numbers into the print function in if else statements? – clions226 Jun 12 '21 at 21:25

2 Answers2

5

We can use case_when like this:

library(tidyverse)

dat <- tibble(P.col = seq(0, 20000, 1000))

mutate(dat, P.bin = case_when(P.col >= 4000 ~ ">4000",
                              P.col <= 3000 & P.col >= 2000 ~ "2000-3000",
                              P.col <= 3000 & P.col >= 2000 ~ "2000-3000",
                              P.col <= 2000 & P.col >=1000 ~ "1000-2000",
                              P.col < 1000 ~ "1000"))
#> # A tibble: 21 x 2
#>    P.col P.bin    
#>    <dbl> <chr>    
#>  1     0 1000     
#>  2  1000 1000-2000
#>  3  2000 2000-3000
#>  4  3000 2000-3000
#>  5  4000 >4000    
#>  6  5000 >4000    
#>  7  6000 >4000    
#>  8  7000 >4000    
#>  9  8000 >4000    
#> 10  9000 >4000    
#> # … with 11 more rows

Created on 2021-06-11 by the reprex package (v2.0.0)

jpdugo17
  • 6,816
  • 2
  • 11
  • 23
  • I am trying to apply the same code format to another column in my dataframe, the code runs and it says it added a new column but when I look at the dataframe the new column isn't present, instead it has repeated each row in the dataframe with the new column header next to the name (eg. P.bin.Pcol). Do you have any suggestions for how I fix this? Couldn't find an answer for this... I've tried restarting R and rerunning the code but it isn't working. It is now also doing this when I run the code you gave. – clions226 Jun 14 '21 at 16:17
2

The ifelse approach you are using is correct but you have some syntax issues.

  • You are not closing the brackets at right place.
  • No mention of dataframe name in ifelse. P.col in itself isn't enough.
  • P.col <=4000 & >= 3000 is not valid. You need P.col <=4000 & P.col >= 3000.

Try the following code -

dat$P.bin<- with(dat, ifelse(P.col>=4000, ">4000",
                   ifelse(P.col <=4000 & P.col >= 3000, "3000-4000",
                   ifelse(P.col<=3000 & P.col >= 2000, "2000-3000", 
                   ifelse(P.col <=2000 & P.col >=1000, "1000-2000", 
                   ifelse(P.col <1000, "1000", NA_character_))))))

Having said that using case_when as suggested by @jpdugo17 might be cleaner way to do this.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213