7

I've been trying to create a very simple function. Essentially I want every element in t$C changed according to the if then statement in my code, and others stay the same. So here's my code:

set.seed(20)
x1=rnorm(100)
x2=rnorm(100)
x3=rnorm(100)
t=data.frame(a=x1,b=x1+x2,c=x1+x2+x3)
fun1=function(multi1,multi2)
{
  v=t$c
  s=c()
  for (i in v)
  {
    if (i<0)
    {
      s[i]=i*multi1
    }
    else if(i>0)
    {
      s[i]=i*multi2
    }
  }

  return(s)
}

fun1(multi1=0.5,multi2=2)

But it gave me just a few numbers. I felt I might made some stupid mistakes but I couldn't figure out.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Jade
  • 115
  • 7
  • 1
    Yes, show us some ideal output. Once we get that we can try to help. But as a general aside, please consider to start writing assignments like this `x1 <- rnorm(100)`. – Shawn Mehan Sep 16 '15 at 23:54
  • Do you want 0 values to be turned into NA or did you want 0s to stay as 0? – Dason Sep 17 '15 at 00:28
  • Hey Shawn thanks for the comments. I thought <- and = is the same thing here. So they're not? – Jade Sep 17 '15 at 02:20
  • @Jade it's a matter of style and `<-` has some esoteric advantages. – Alex Sep 17 '15 at 02:34
  • http://stackoverflow.com/questions/1741820/assignment-operators-in-r-and – Alex Sep 17 '15 at 02:35

1 Answers1

10

tl;dr This operation can be vectorized. You can use the following method, assuming you want to leave values that are 0 or NA alone.

with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))

If you want to include them in one side (e.g. on the positive side), it's even more simple.

with(t, c * ifelse(c < 0, 0.5, 2))

As far as your loop goes, you've got a few issues there.

First, you were indexing s by decimal values, which would likely cause errors in the calculations. This is also the reason why your result vector was so short. When you indexed in the loop, the indices were moved to integer values and since some of them were repeated, s ended up being very short.

The actual unique index length went something like this -

length(unique(as.integer(t$c)))
# [1] 9

And as a result you got, as a simple example,

s[c(1, 2, 1, 1)] <- something

Since 1 is repeated, only indices 1 and 2 were changed. This is what was happening in your loop. Further illustrated as

x <- 1:5
x[1.2]
# [1] 1
x[1.99]
# [1] 1

Next, notice below that we have allocated the vector s. We can do that because we know the length of the resulting vector will be the same as v. This is the recommended, more efficient way rather than building the vector in the loop.

Moving on, I changed for(i in v) to for(i in seq_along(v)) to correct this. Now we are indexing with a sequence for i. Then we also need to index v in the same manner. Finally, we can assign s[i] <- if(... instead of assigning to the same index inside the if() statement.

Also note that you haven't accounted for 0 or any other values that may appear in v (like NA). I added a final else where we just leave those values alone. Change that as you see necessary. Furthermore, instead of going to the global environment to get t$c, we can pass it as an argument and make this function more general (credit to @ShawnMehan for that suggestion). Here's the revised version:

fun1 <- function(vec, multi1, multi2) {
    s <- vector("numeric", length(vec))
    for (i in seq_along(vec)) {
        s[i] <- if (vec[i] < 0) {
            vec[i] * multi1
        } else if(vec[i] > 0) {
            vec[i] * multi2
        } else {
            vec[i]
        }
    }
    return(s)
}

So now we have a length 100 result

x <- fun1(t$c, 0.5, 2)
str(x)
# num [1:100] 2.657 -0.949 7.423 -0.749 5.664 ...

I wrote this long explanation because I figure you are learning how to write a loop. In R though, we can vectorize this entire operation and put it into one line of code. The following line gives the same result as fun1(t$c, 0.5, 2).

with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))

Thanks to @Frank for catching my calculation oversight.

Hopefully this all makes sense. Sometimes I don't do well with explanations and technical jargon. If there are any questions, please comment.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • awesome. Only thing that I was going to add that isn't there was to place a third param in the function call for input, such that one can reuse across all possible columns `x <- fun1(t$c,0.5,2)` via `fun1 <- function(input_vector,multi1, multi2)`. – Shawn Mehan Sep 17 '15 at 00:12
  • Good idea. I'll add that in – Rich Scriven Sep 17 '15 at 00:13
  • I think this does a good job of cleaning up their code although I agree with @ShawnMehan in that the data should be an input to the function. It's clearly not optimal for R though as this could be vectorized fairly easily instead of looping. – Dason Sep 17 '15 at 00:14
  • @Dason - I agree. But it seems like OP might be doing a looping assignment or something. I added a vectorized method – Rich Scriven Sep 17 '15 at 00:17
  • @ShawnMehan - Thanks a lot – Rich Scriven Sep 17 '15 at 00:44
  • And to add some information for people looking at this answer: > system.time(with(t, ifelse(c < 0, c * 0.5, ifelse(c > 0, c * 2, NA)))) user system elapsed 0.003 0.002 0.004 > system.time(fun1(t$c,0.5,2)) user system elapsed 0.014 0.003 0.016 with `x1=rnorm(10000) x2=rnorm(10000) x3=rnorm(10000)` – Shawn Mehan Sep 17 '15 at 00:47
  • 1
    For a continuous var, the vector will never be 0, so that case doesn't need to be considered. Fewer computations are done if you only multiply the vector by one thing: `fun2 <- function(v,m1=.5,m2=2) v*ifelse(v<0,m1,m2); fun2(t$c)` – Frank Sep 17 '15 at 01:59
  • @Frank - I will leave the reference to zero in there just in case the example data is not really representative of the actual data. It happens a lot, as you know. – Rich Scriven Sep 17 '15 at 02:38
  • Thanks to you all for some of the insights I never thought of. Also I found in my original version I totally messed up with i and vec[i] – Jade Sep 17 '15 at 02:48
  • @RichardScriven. One more question. If I want to calculate average of s. Since R doesn't allow me to return two values, how I'm supposed to do that? – Jade Sep 17 '15 at 03:27
  • @Jade Sure it does. You can return a list. `return(list(s=s, mean=mean(s)))` – Rich Scriven Sep 17 '15 at 03:34