0

I have written a custom function that performs a mathematical transformation on a column of data with the inputs being the data and one other input (temperature). I would like to have 2 different logical checks. The first one is whether or not any values in the column exceed a certain threshold, because the transformation is different above and below the threshold. The second is a check if the temperature input is above a certain value and in that case, to deliver a warning that values above the threshold are unusual and to check the data.

Right now, I have the function written with a series of if/else statements. However, this a warning that it is only using the first element of the string of T/F statements. A simplified example of my function is as follows:

myfun = function(temp,data) {
    if(temp > 34){
    warning('Temperature higher than expected')
  }
    if (data > 50) {
      result = temp*data
      return(result)
    } else if(data <= 50) {
      result = temp/data
      return(result)
    }
  }

myfun(temp = c(25,45,23,19,10), data = c(30,40,NA,50,10))

As you can see, because it is only using the first value for the if/else statements, it does not properly calculate the return values because it doesn't switch between the two versions of the transformation. Additionally, it's only checking if the first temp value is above the threshold. How can I get it to properly apply the logical check to every value and not just the first?

-edit-simplified the function per @The_Questioner's suggestion and changed < 50 to <= 50.

C. Denney
  • 577
  • 4
  • 16
  • 1
    It seems like you're doing the same thing with the if-statement in the data variable, regardless of whether the temp is > 50. So, instead of writing that part out twice, why note just write it ounce, outside of the temp if-statement? – The_Questioner Jan 09 '19 at 19:07
  • 1
    @The_Questioner Yeah, you are totally right, that would simplify the function and make it easier to read for sure. – C. Denney Jan 09 '19 at 19:37
  • 2
    Use `ifelse` for a vectorized version. Suggested duplicate [vectorized if in R](https://stackoverflow.com/q/4042413/903061). That will handle your transformation, but I'd suggest converting the warning to something like `if(any(temp > 34)) warning("Some temperatures higher than expected")` – Gregor Thomas Jan 09 '19 at 21:46
  • @Gregor Thanks, the if(any() suggestion is really helpful. I'll give ifelse a try. I had considered it but I'm not always a fan because you end up with nested ifelse statements than can be hard to read. – C. Denney Jan 09 '19 at 22:05
  • Yeah, nested `ifelse` can be annoying. But in this case, you only have one condition so no nesting is needed: `ifelse(data > 50, temp * data, temp / data)`. Compare that one line is equivalent to about 20 lines in the answer you accepted. And it will be *much* more efficient too... Avoid `ifelse` when the nesting gets too deep. But a case like this is perfect for `ifelse`. – Gregor Thomas Jan 10 '19 at 03:09

1 Answers1

2

The main issue with your code is that you are passing all the values to the functions as vectors, but then are doing single element comparisons. You need to either pass the elements one by one to the function, or put some kind of vectorized comparison or for loop into your function. Below is the for loop approach, which is probably the least elegant way to do this, but at least it's easy to understand what's going on.

Another issue is that NA's apparently need to be handled in the data vector before passing to any of your conditional statements, or you'll get an error.

A final issue is what to do when data = 50. Right now you have conditional tests for greater or less than 50, but as you can see, the 4th point in data is 50, so right now you get an NA.

myfun = function(temp,data) {
    result <- rep(NA,length(temp))
    for (t in 1:length(temp)) {
        if(temp[t] > 34) {
            warning('Temperature higher than expected')
            if (!is.na(data[t])) {
                if (data [t] > 50) {
                    result[t] <- temp[t]*data[t]
                } else if(data[t] < 50) {
                    result[t] <- temp[t]/data[t]

                }
            }
        } else {
            if (!is.na(data[t])) {
                if (data[t] > 50) {
                    result[t] <- temp[t]*data[t]

                } else if(data[t] < 50) {
                    result[t] <- temp[t]/data[t]

                }
            }
        }
    }
    return(result)
}

Output:

> myfun(temp = c(25,45,23,19,10), data = c(30,40,NA,50,10))
[1] 0.8333333 1.1250000        NA        NA 1.0000000
shwan
  • 538
  • 6
  • 21
  • That makes sense. And that should have been a <= comparison instead of <, which would solve the 50 problem. I've read elsewhere that in R, for loops are generally slow and should be avoided. Would this be a situation where I should try to use an apply function instead? – C. Denney Jan 09 '19 at 19:36
  • Yes, using an apply variant would likely be faster, but can be a bit more tricky to understand what's going on...often I find that the increase in speed isn't worth the cost of comprehension. Anyway, check out this on using apply functions on multiple vectors: https://stackoverflow.com/questions/35352647/r-apply-a-function-to-every-element-of-two-variables-respectively – shwan Jan 09 '19 at 19:41