How can I repeat a funciton for severeal columns and combine it in a data.frame and calculate the mean?

Question

I have data about a floodplain with a date and flooded 0/1. I have a funciton which counts the flooded days with different periods with different end date. Now I want to repeat the function for severeal columns (several heigths for one observation transect) and calculate the mean of flooded days for each period. I dont want to repeat my function for every column manual.

I thougt about a solution with a loop or something with the apply family, but I'm not enough R Guru.

set.seed(1)
df2 <- data.frame(date=seq(as.Date("2016-11-01"), as.Date("2018-11-01"), "day"),
                  flooded=rbinom(731, 1, .5), flooded2=rbinom(731, 1, .5), flooded2=rbinom(731, 1, .5))
date_end2 <- sort(sample(df2$date, 4))
period2 <- c(30,60,90)
###########################################################################################
floodCount <- function(datecol, floodcol, e, p) {
  e <- as.Date(e)
  datecol <- as.Date(datecol)
  stopifnot(!anyNA(c(e, p)))
  stopifnot((e - p) %in% datecol)
  return(sum(floodcol[which((datecol == e - p + 1)):which(datecol == e)]))
}

F_2017 <- sapply(period2, function(p) with(df2, floodCount(date, flooded,date_end2[1], p)))
S_2017 <- sapply(period2, function(p) with(df2, floodCount(date, flooded,date_end2[2], p)))
F_2018 <- sapply(period2, function(p) with(df2, floodCount(date, flooded,date_end2[3], p)))
S_2018 <- sapply(period2, function(p) with(df2, floodCount(date, flooded,date_end2[4], p)))

FLOODED.T <- rbind(F_2017, S_2017, F_2018, S_2018)
FLOODED.T2 <- as.data.frame(FLOODED.T)
names(FLOODED.T2)[1:3] <- period2[1:3]

As solution of the intermediate step I expect a data.frame like this:

       30 60 90 30_1 60_1 90_1 30_2 60_2 90_2 ...
F_2017 11 28 42 21   31   45   ... 
S_2017 17 30 44 18   28   ...
F_2018 14 32 48 15   ...
S_2018 15 31 49 ...

As final output with the mean of each period per enddate

       30_m 60_m 90_m 
F_2017 10   30   52
S_2017 21   28   41
F_2018 13   32   47
S_2018 5    32   35

I'm open to your smart and genius R ideas ;)

score 2 · Accepted Answer · answered Jul 11 '19 at 08:55

Ok, things have changed and we should roll this up again.

We need another floodFun that's suitable for multiple columns.

floodFun <- function(floodcol, datecol, e=date.end2, p=period2) {
  fc2 <- Vectorize(function(x, y, ...) {
    e <- as.Date(e[x])
    p <- p[y]
    # stopifnot(!anyNA(c(e, p)))
    # stopifnot((e - p) %in% datecol)
    if (anyNA(c(e, p)) | !((e - p) %in% datecol))
      S <- NA
    else
      S <- sum(floodcol[which(datecol == e - p + 1):which(datecol == e)])
  })
  res <- outer(seq_along(date.end2), seq(period2), fc2)
  return(res)
}

In regard of the cell means, best would be a sapply into a 3D-array,

A <- sapply(df2[-1], function(x) 
  `dimnames<-`(floodFun(x, df2$date, e=date.end2, p=period2), 
               list(as.character(date.end2), period2)), simplify="array")

where we can easily extract the means.

apply(A, 1:2, mean)
#                  30       60       90
# 2018-05-02 17.33333 33.33333 48.33333
# 2018-06-19 15.66667 30.66667 47.00000
# 2018-06-25 15.66667 30.00000 47.33333
# 2018-08-01 12.66667 29.33333 43.00000
# 2018-08-10 12.00000 29.33333 43.00000
# 2018-09-08 13.33333 25.66667 43.00000
# 2018-09-26 12.33333 27.33333 39.33333
# 2018-10-19 16.33333 27.66667 42.33333
# 2018-10-24 16.33333 28.66667 43.66667
# 2018-10-26 16.00000 28.33333 43.33333

For the "intermediate step", do

tmp <- Map(function(x) `dimnames<-`(floodFun(x, df2$date, e=date.end2, p=period2),
                                  list(as.character(date.end2), period2)), df2[-1])
RES <- do.call(cbind, lapply(tmp, function(x) 
  `colnames<-`(x, paste(colnames(x), names(tmp), sep="."))))

Data

set.seed(42)
df2 <- data.frame(date=seq(as.Date("2016-11-01"), as.Date("2018-11-01"), "day"),
                  flooded.1=rbinom(731, 1, .5), flooded.2=rbinom(731, 1, .5), 
                  flooded.3=rbinom(731, 1, .5))
date.end2 <- sort(sample(tail(df2$date, 200), 10))  # I chose `tail` to avoid NAs
period2 <- c(30, 60, 90)

Thank again for your very good solutions. The 3D-array is very interesting and I see what is possible with this apply and sapply functions, I should leran more about this. — Nesch, Jul 11 '19 at 10:30
@Nesh You are welcome. There's a good [post about the apply family](https://stackoverflow.com/a/7141669/6574038) on Stack Overflow you may want to read. — jay.sf, Jul 11 '19 at 10:34

score 0 · Answer 2 · answered Jul 11 '19 at 09:16

Here is another way to solve this using nested lapply/sapply to get all the combinations.

m1 <- (Reduce(`+`, lapply(df2[-1], function(x) 
                      t(sapply(date_end2, function(y) 
     sapply(period2, function(z) floodCount(df2$date, x,y, z))))))/(ncol(df2) - 1))
colnames(m1) <- paste0(period2, "_m")

m1
#     30_m 60_m 90_m
#[1,] 14.0 30.7 42.7
#[2,] 18.0 31.0 43.7
#[3,] 16.0 33.7 49.7
#[4,] 14.7 27.3 43.3

data

set.seed(123)
df2 <- data.frame(date=seq(as.Date("2016-11-01"), as.Date("2018-11-01"), "day"),
               flooded=rbinom(731, 1, .5), flooded2=rbinom(731, 1, .5), 
              flooded2=rbinom(731, 1, .5))
date_end2 <- sort(sample(df2$date, 4))
period2 <- c(30,60,90)

How can I repeat a funciton for severeal columns and combine it in a data.frame and calculate the mean?

2 Answers2

Data