0

I'm writing my first function ever (including any other programming language) and I'm a little confused on the proper structure for if, else and ifelse. I've searched a ton of examples, but none are that clear to me.

Situation - I'm trying to bucket clients by how long they have been clients for, then turn that into a factor.

#Sample Data
clientID <- round(runif(2,min=2000, max=3000),0)
MonthsSinceSignUp <- round(runif(20,min=1, max=60),0)
df <- data.frame(cbind(clientID,MonthsSinceSignUp))

For a given client, I would like to determine if they have been so for less than a year, more than year, but less than 2, etc.

This is my first crack at a function:

ClientAgeRange <- function(MonthsSinceSignUp) {
  if (MonthsSinceSignUp < 13) {ClientAgeRange <- '1 year'}
} else {
  if (MonthsSinceSignUp > 13 & MonthsSinceSignUps < 25) {ClientAgeRange <- '2 years'}
} else {ClientAgeRage <- '3+ years'}

The error that I keep getting is: Error: unexpected '}' in "}", which would indicate I'm missing or have an extra closing bracket. However, despite my trouble shooting, I can't locate it. But - I think in general, I'm not apply the correct structure to the function. I'm trying to produce a if this, then set this variable as that. How can I structure this function properly?

Lastly - if I wanted to add the output of the function to the dataframe, is apply the correct way to do so?

mikebmassey
  • 8,354
  • 26
  • 70
  • 95
  • 1
    No one seems to have mentioned this minor detail, but all of your clients who signed up 13 months ago will be categorized as '3+ years' unless you change the `> 13` to `>= 13` ;) – lockedoff Jul 30 '12 at 19:02

3 Answers3

5

An answer in two parts:

  1. A tip
  2. A fix

The tip:

My first tip is to use a code editor that does bracket matching. For example, in Notepad++ you get this:

PS. I'm not recommending Notepad++ - use Rstudio instead - I'm simply using Notepad++ because of the garish (and thus easy to spot) colours

enter image description here

Notice that the highlighted brace (in red) matches with a brace in the middle of your function. This reveals that there is redundant brace at the end of your first if. So, fix that first:

enter image description here

OK, now there is no matching brace (no highlighted red), so you need to add the missing brace at the end of your function:

enter image description here


The fix:

But you can vastly simplify your function if you use cut, which is designed to do this type of analysis:

ClientAgeRange <- function(x) {
  cut(x, breaks=c(0, 13, 25, Inf), labels=c("1 year", "2 years", "3+ years"))
}

Try it on your code:

ClientAgeRange(df$MonthsSinceSignUp)
 [1] 2 years  1 year   3+ years 2 years  3+ years 3+ years 2 years  2 years  3+ years 3+ years 1 year  
[12] 3+ years 2 years  3+ years 3+ years 3+ years 3+ years 3+ years 3+ years 3+ years
Levels: 1 year 2 years 3+ years
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • 2
    If Andrie's online, it simply doesn't make sense to try typing an answer. I was *just* about to paste a similar `cut` solution (similar because I hadn't wrapped it in a function)... and then I refreshed the page. – A5C1D2H2I1M1N2O1R2T1 Jul 30 '12 at 16:32
  • -1 for not using `Curry` (kidding! great answer, especially given that it teaches about function-writing more generally) – Ari B. Friedman Jul 30 '12 at 17:14
2
if (MonthsSinceSignUp < 13) {ClientAgeRange <- '1 year'}
}

You have an extra } here.

As a general rule, it's a good idea to adopt conventions for formatting your code. One convention I would highly recommend is always putting the body of a "block" (here I'm using block as a generic term for "stuff inside {}", which includes function bodies, if statements, and loops) on its own line, as below:

ClientAgeRange <- function(MonthsSinceSignUp) {
  if (MonthsSinceSignUp < 13) {
    ClientAgeRange <- '1 year'
  } else if (MonthsSinceSignUp > 13 & MonthsSinceSignUps < 25) {
    ClientAgeRange <- '2 years'
  } else {
    ClientAgeRage <- '3+ years'
  }
}

See how that makes everything clearer?


As to your second question, a function without side-effects takes input, does stuff, and returns output. You don't have any return value right now, and from your naming conventions it seems like you are a bit confused.

Try this:

ClientAgeRange <- function(MonthsSinceSignUp) {
  if (MonthsSinceSignUp < 13) {
    result <- '1 year'
  } else if (MonthsSinceSignUp > 13 & MonthsSinceSignUps < 25) {
    result <- '2 years'
  } else {
    result <- '3+ years'
  }
  return(result)
}

The return( is optional in R, but it will help you think through functions more clearly.

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
2

Try the following (note that I used else if to make it simpler):

ClientAgeRange <- function(MonthsSinceSignUp) {
  if (MonthsSinceSignUp < 13) {
      ClientAgeRange <- '1 year'
  } else if (MonthsSinceSignUp > 13 & MonthsSinceSignUp < 25) {
      ClientAgeRange <- '2 years'
  } else {ClientAgeRage <- '3+ years'}
} 

Then, you can add it to your dataframe as following:

df$ClientAgeRange <- sapply(MonthsSinceSignUp, ClientAgeRange)

As you said, apply (I used sapply in this case; there are a couple of places you can go to read about different apply functions, here for example) is the right way to go. This is because we cannot simply pass the entire vector into the function; it requires individual elements to do the necessary comparisons.

Community
  • 1
  • 1
Edward
  • 5,367
  • 1
  • 20
  • 17