1

I have this vector:

 x <- c(5,2,-4,-6,-2,1,4,2,-3,-6,-1,8,9,5,-6,-11)

I use this function:

myfunction <- function(x){
     n <- length(x)
     fx <- numeric(n)
     fx[1] <- min(x[1],0)
     for(i in 2:n){fx[i] <- min(0,fx[i-1]+x[i])}
     fx

     x_min <-min(x)
     fx_min <- min(fx)

     fx_05 <- numeric(n)
     fx_05[1] <- min(fx[1],0)
     for (i in 2:n) {
       if (sum(fx_05[i-1]+x[i])>0) {  
          fx_05[i] <- 0
       } else if ((sum(fx_05[i-1]+x[i]))<(fx_min*0.5)) {
          fx_05[i] <- (fx_min*0.5)
       } else { fx_05[i] <- sum(fx_05[i-1]+x[i]) }
     }
     fx_05
     as.data.frame(matrix(c(x, fx_05), ncol = 2 ))
}
xx <- myfunction(x)

The dataframe xx is

    V1   V2
1    5  0.0
2    2  0.0
3   -4 -4.0
4   -6 -8.5
5   -2 -8.s
6    1 -7.5
7    4 -3.5
8    2 -1.5
9   -3 -4.5
10  -6 -8.5
11  -1 -8.5
12   8 -0.5
13   9  0.0
14   5  0.0
15  -6 -6.0
16 -11 -8.5`

I would like to apply this function to a data.frame :

df <- data.frame(x <- c(5,2,-4,-6,-2,1,4,2,-3,-6,-1,8,9,5,-6,-11),
                   y <- c(5,2,-4,-6,-2,1,4,2,-3,-6,-1,8,9,5,-6,-11),
                   z <- c(5,2,-4,-6,-2,1,4,2,-3,-6,-1,8,9,5,-6,-11))

Using:

output <- myfunction(df) 

It doesn't work, and using:

outputs <- data.frame(sapply(df, myfunction))

the form of the data.frame output is not correct. It should be 2 columns for each original column of the data.frame.

loki
  • 9,816
  • 7
  • 56
  • 82

1 Answers1

2

In this case, you would like to use lapply. It will handle each column of the data.frame, as it actually is a list of equal-length vectors, and return a two column data.frame each.

x <- lapply(df, myfunction)

Also, sapply works just fine. The only difference is that it looks different at the beginning. See print(x) for the difference between all solutions.

x <- sapply(df, myfunction)

Afterwards you probably want to combine them from a list to a data.frame again. You can do this with do.call

df2 <- do.call(cbind, x)

This will mess up the column names. You can change these using names

names(df2) <- NULL
df2
# 1    5  0.0   5  0.0   5  0.0
# 2    2  0.0   2  0.0   2  0.0
# 3   -4 -4.0  -4 -4.0  -4 -4.0
# 4   -6 -8.5  -6 -8.5  -6 -8.5
# ....

Side Note:

If you don't have a data.frame but a matrix as input, another option would be apply with the with MARGIN = 2.

x <- apply(df, MARGIN = 2, myfunction)

Although in this example, it works as well, you will run into trouble when having differing data types across your vectors as it converts the data.frame to a matrix before applying the function. Therefore it is not recommended. More info on that can be found in this detailed and easy-to-understand post!

Further reading on this:
Hadley Wickham's Advanced R. Also check out the section on data types on this site.
Peter Werner's blog post


I greatly appreciate the input of @Gregor on this post.

loki
  • 9,816
  • 7
  • 56
  • 82
  • 1
    Please don't equate `lapply` and `sapply` with `apply(..., MARGIN = 2)`. `apply` is meant for matrices, and should be used when that is appropriate. I'm struggling to think of a case when `apply(df, MARGIN = 2)` is needed on a data frame, almost always `lapply` or `sapply` should be used. – Gregor Thomas Dec 07 '18 at 15:23
  • I understand your concerns. Would you prefer to remove it from the answer, or mark it as one possible but inconvenient solution? I merely included it since it leads to the desired output. – loki Dec 07 '18 at 15:26
  • 2
    If it were my answer, I'd just remove it. Nice and simple. Even better would be to move it to the bottom and explain. Something like "if you have a matrix, not a data frame, use `apply`. It's possible to use `apply` on a data frame, but it will first convert the data frame to a `matrix`, which is risky if you have columns of different types, so if you have a data frame it's probably better to just use `l/s/vapply`." But with more detail. Maybe link out to [this excellent answer](https://stackoverflow.com/a/7141669/903061) for more info. – Gregor Thomas Dec 07 '18 at 15:31