1

In R language, I want to use switch statement to replace the nest if else statement. I want to assign value to a new column, my idea is:

## Create a function to seperate the case

Range <- function(x)
    if (CityData_Group_Copy$BadDebtNum[x] < 26)  
              { CityData_Group_Copy$BadDebtRange[x] <- "1~25"}

    else if(CityData_Group_Copy$BadDebtNum[x] > 25 && CityData_Group_Copy$BadDebtNum[x] < 51)  
              {CityData_Group_Copy$BadDebtRange[x] <- "26~50"}

    else if(CityData_Group_Copy$BadDebtNum[x] > 51 && CityData_Group_Copy$BadDebtNum[x] < 76)   
              {CityData_Group_Copy$BadDebtRange[x] <- "51~75"}

    else if(CityData_Group_Copy$BadDebtNum[x] > 75 && CityData_Group_Copy$BadDebtNum[x] < 101)  
              {CityData_Group_Copy$BadDebtRange[x] <- "76~100"}

    else if(CityData_Group_Copy$BadDebtNum[x] > 100)
              { CityData_Group_Copy$BadDebtRange[x] <- "100+"}


## Assign the result to the new column "CityData_Group_Copy$BadDebtRange" 

for(i in 1: nrow(CityData_Group_Copy) ){
  Range(i)
}

I also tried this solution:

Range <- function(x)
 switch (true) {
  case (CityData_Group_Copy$BadDebtNum[x] < 26): CityData_Group_Copy$BadDebtRange[x] <- "1~25"  break;
  case (CityData_Group_Copy$BadDebtNum[x] > 25 && CityData_Group_Copy$BadDebtNum[x] < 51): CityData_Group_Copy$BadDebtRange[x] <- "26~50"  break;
  case (CityData_Group_Copy$BadDebtNum[x] > 51 && CityData_Group_Copy$BadDebtNum[x] < 76): CityData_Group_Copy$BadDebtRange[x] <- "51~75"  break;
  case (CityData_Group_Copy$BadDebtNum[x] > 75 && CityData_Group_Copy$BadDebtNum[x] < 101): CityData_Group_Copy$BadDebtRange[x] <- "76~100"  break;
  case (CityData_Group_Copy$BadDebtNum[x] > 100): CityData_Group_Copy$BadDebtRange[x] <- "100+" break;
  }

But it seems there is no such syntax in R. I got a error:

Error: unexpected 'break' in " case (CityData_Group_Copy$BadDebtNum[x] > 101): CityData_Group_Copy$BadDebtRange[x] <- "100+" break"*

So are there any solution to implement my idea in a simple way?

rawr
  • 20,481
  • 4
  • 44
  • 78
Ye Xu
  • 83
  • 3
  • 10

3 Answers3

5

It looks like you're binning data, which can be done with the cut function:

bad_debt_num = sample(1:120, 100, replace=T)
cut(bad_debt_num, breaks=c(0, 25, 50, 75, 100, 1000))

More info about binning in the question Generate bins from a data frame.

The R switch statement is fairly limited.

Community
  • 1
  • 1
cbare
  • 12,060
  • 8
  • 56
  • 63
  • 1
    Yes, I want to segment the data into different interval. I just edited you code into: cut(CityData_Group_Copy$BadDebtNum, breaks=c(-Inf,0, 25, 50, 75, 100, Inf),labels=c("0","1~25","26~50","51~75","76~100","100+")) and it works. Thank you! – Ye Xu May 18 '15 at 03:14
  • 1
    cut is the R Function of the Day! http://www.r-bloggers.com/r-function-of-the-day-cut/ – tumultous_rooster Jun 04 '15 at 20:18
2

First, why in a set of if else if type statements are you double defining logic? All you need is:

iel = function(x){
  if(data[x] < 26) {
    return("<=25")
  } else if(data[x] < 51){
    return("26~50")
  } else if(data[x] < 76){
    return("51~75")
  } else if(data[x] < 101){
    return("76~100")
  } else {
    return("100+")
  }
}

How does this compare to the other answer that uses ifelse() statements? Same thing, you can reduce the amount of checking your doing by leveraging the the fact you're nesting logic, no need to say "if it's not < 26 then check to make sure it's > 25" - it's redundant.

ieie = function(data){
  return(ifelse(data< 26, "<=25", 
         ifelse (data < 51,"26~50",
                 ifelse(data < 76, "51~75",
                        ifelse (data < 101,"76~100",
                                "100+")))))
}

How do this solutions compare from a speed perspective? Your mileage may vary, but:

library(microbenchmark)
data = rnorm(1e6,50,15)
rmicrobenchmark(sapply(1:length(data),iel),ieie(data), times=50L)

#> Unit: seconds
                          expr      min       lq     mean   median       uq      max neval
 sapply(1:length(data), group) 1.710709 2.016842 2.243246 2.223891 2.376228 2.954147    50
                    ieie(data) 1.902938 2.094678 2.296946 2.220572 2.438968 3.929247    50

By taking the traditional logic, even without vectorization, and wrapping it in an sapply (which returns a vector), I see slight improvements over the nested ifelse() in min, mean and max. This is only based on 50 reps (~2.5 seconds each (on average) means ~5 seconds per simulation). The data never changed, this is just looking at how fast the computer can crunch the data taking out the noise of what else was happening on my computer at the same time.

What if we bump it up to a vector of length 1e7?

data = rnorm(1e7,50,15)
microbenchmark(sapply(1:length(data),iel),ieie(data), times=5L)
#> Unit: seconds
                        expr      min       lq     mean   median       uq      max neval
 sapply(1:length(data), iel) 22.38624 27.42520 27.74565 27.85335 27.89591 33.16756     5
                  ieie(data) 17.52102 17.62965 18.90965 19.49140 19.89423 20.01194     5

This is actually very interesting to me, I had always been told/believed nested ifelse() statements to be bad for performance, yet clearly that is not the case when the size of the vector increases.

Still, here, the cut function is vastly superior:

data6 = rnorm(1e6,50,15)
data7 = rnorm(1e7,50,15)
microbenchmark(cut(data6, breaks=c(0, 25, 50, 75, 100, 1000)),cut(data7, breaks=c(0, 25, 50, 75, 100, 1000)),times=10L)
#>Unit: milliseconds
                        expr       min        lq      mean    median        uq       max neval
 cut(data6, breaks = c(...))  204.1436  206.2564  224.1509  221.5659  232.8876  260.8075    10
 cut(data7, breaks = c(...)) 2059.5744 2118.6611 2213.9544 2210.8787 2271.1089 2407.6448    10

Wow! that's in milliseconds. The built in functions in R which leverage other languages sure pay off.

So, my answer doesn't provide new solutions, but hopefully helps education on the processing speed of the different approaches.

Mark
  • 4,387
  • 2
  • 28
  • 48
  • Thank you so much Mark. I learned a lot from your comment! But sometimes if I have more if-else statement to write, the nesting logic will become more difficult to read by others. – Ye Xu May 18 '15 at 03:29
1

Use ifelse: No need for switchfunction

 CityData_Group_Copy$BadDebtRange<-with(CityData_Group_Copy,
ifelse(BadDebtNum< 26, "1~25", 
ifelse (BadDebtNum> 25 & BadDebtNum< 51,"26~50",
ifelse(BadDebtNum> 51 & BadDebtNum< 76, "51~75",
ifelse (BadDebtNum> 75 & BadDebtNum < 101,"76~100",
"100+")))))
user227710
  • 3,164
  • 18
  • 35
  • How is this better than the original code? In fact, I'll bet with timing, this is slower than if(){}else if(){} ... else{} type structures. – Mark May 18 '15 at 01:16
  • Please read page 21 of [R Inferno](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf). – user227710 May 18 '15 at 01:21
  • I knew it was vectorized, I assumed using `sapply()` would address that issue, apparently not. And in a big way with very large vectors. – Mark May 18 '15 at 01:58