0

I have a series of if statements in a function. It looks like this:

my_func <- function(data, selection) {

 if (selection == 'p+c') {
 predictors = 'chicago'
 preds <- data
}
else if (selection== 'p') { 
  predictors = 'new_york'
  preds <- data %>% dplyr::select(-c(region, sale))
}
else if (selection == 'c') {
  predictors = 'california'
  preds <- data %>% dplyr::select(region, sale)
} 
# then the function does something else with predictors and preds, 
#  and returns a dataframe  
}

my_func(my_data, selection = 'p')

I keep getting the warning that the condition has length > 1 and only the first element will be used. Weirdly, it doesn't actually break anything (it all works as expected), but I still would rather amend this problem.

I read that this is a problem with vectorization, but I don't know how to overcome this.

I already tried replacing the if/else with ifelse (as suggested in other posts) but this did not work, maybe because I do more than one operation at each if statement. I did this:

 ifelse (selection == 'p+c') {
 predictors = 'chicago'
 preds <- data
}
ifelse (selection== 'p') { 
  predictors = 'new_york'
  preds <- data %>% dplyr::select(-c(region, sale))
}
ifelse (selection == 'c') {
  predictors = 'california'
  preds <- data %>% dplyr::select(region, sale)
}
  • 1
    "I already tried replacing the if/else with ifelse (as suggested in other posts) but this did not work" -- show what you tried. It is non-problematic to have nested calls to `ifelse`. – John Coleman Jul 20 '21 at 11:42
  • Hi @JohnColeman I just amended the post to show it, thank you for your comment – salix_august Jul 20 '21 at 11:43
  • 1
    You aren't using `ifelse` as a *function* at all. If you are familiar with Excel, `ifelse` works like Excel's `IF` function. – John Coleman Jul 20 '21 at 11:44
  • Sorry @JohnColeman, I'm not sure I know how else to do this – salix_august Jul 20 '21 at 11:46
  • did you go over the documentation for `ifelse()`? maybe this will help: https://stackoverflow.com/questions/18012222/nested-ifelse-statement – D.J Jul 20 '21 at 11:48
  • 1
    This means that `selection` has more than 1 value in it like `c('p+c', 'p')`. What would be output in such case? – Ronak Shah Jul 20 '21 at 11:57
  • 1
    What does `selection` contain? – rbasa Jul 20 '21 at 12:20
  • Hi @rbasa, sorry I didn't specify the details. I amended the post now. 'selection' is one of the arguments of a function that the if/else statements are in. – salix_august Jul 20 '21 at 12:58
  • Hi @RonakShah, yes `selection` can have three values. It is an argument in the function, my edits (hopefully) reflect this now. Thank you – salix_august Jul 20 '21 at 13:03
  • 1
    Your code is still not reproducible or clear to me. So `selection` can have 3 values but what is the expected output in that case? Consider adding an example dummy dataframe and show us the expected output for it. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269) – Ronak Shah Jul 20 '21 at 13:06
  • 1
    Also if you are running `my_func(my_data, selection = 'p')` this should not give you the warning since `selection` is only `p` here which is of length 1 and not length > 1 (as mentioned in the warning). Maybe the warning is coming from somehwere else in the code that you have not shown. – Ronak Shah Jul 20 '21 at 13:08
  • @RonakShah yep, it was something else in the code. I noticed I was over-writing the `selection`, and it added more elements to it. Thank you so much for pointing this out. – salix_august Jul 20 '21 at 13:13
  • `the condition has length > 1 and only the first element will be used` will only show up if `selection` contains more than 1 element. – rbasa Jul 20 '21 at 14:41

2 Answers2

1

ifelse is a function, you need to assign the results of it to your variables (rather than placing the assignments inside the function call itself). Without a reproducible example (which you neglected to provide -- see How to make a great R reproducible example?) it is hard to be sure that the following code does exactly what you want, but something like:

predictors <- ifelse(selection == 'p+c',
    'chicago',
     ifelse(selection == 'p',
         'new york',
          ifelse(selection == 'c',
              'california',
              'NA')))

(with similar code for preds).

Mike
  • 3,797
  • 1
  • 11
  • 30
John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • Thank you. Do I have to run this two times then, one time with the names (e.g. 'chicago') and one time to change the data (i.e. for `preds`)? – salix_august Jul 20 '21 at 12:56
1

You have two questions here. The message

the condition has length > 1...

arises because if() is not vectorised. I assume selection has more than one value.

ifelse are most useful when you have exactly two options. For multiple options, a decent option is nested else if() statements:

Without the data, I can't check this, a nested else if solution would be:

 if (selection == 'p+c') {
   predictors = 'chicago'
   preds <- data
 } else if(selection == 'p') { 
   predictors = 'new_york'
   preds <- data %>% dplyr::select(-c(region, sale))
 } else if (selection == 'c') {
   predictors = 'california'
   preds <- data %>% dplyr::select(region, sale)
 } else { # Good practice to capture errors safely
   stop("Selection not found")
 }

If you need to keep selection as a vector, e.g. selection <- c("p+c", "c") then you make the above statement into a function and pass it to an 'apply()' statement, e.g.

checkFunction <- function(selection) {
    if(selection == 'p+c') {
      predictors = 'chicago'
      preds <- data
    } else if(selection == 'p') { 
      predictors = 'new_york'
      preds <- data %>% dplyr::select(-c(region, sale))
    } else if (selection == 'c') {
      predictors = 'california'
      preds <- data %>% dplyr::select(region, sale)
    } else { # Good practice to capture errors safely
      stop("Selection not found")
    }
  
  return(list(predictors, preds))
}
   
output <- sapply(selection, checkFunction)

 > output
     p+c       c        
[1,] "chicago" "california
[2,] tbl_df,3  tbl_df,2 

output[,1]

[[1]]
[1] "chicago"

[[2]]
# A tibble: 5 x 3
  region  sale other
   <int> <int> <int>
1      1     2     3
2      2     3     4
3      3     4     5
4      4     5     6
5      5     6     7
Tech Commodities
  • 1,884
  • 6
  • 13