I have objects containing monthly data on plant growth. Each object is a fixed number of columns, and the number of rows is equal to the number of months the plant survives. I would like to take the mean of these objects so that the mean considers only plants surviving at a given timestep. Here is example data:
df1 <- data.frame(GPP = 1:10, NPP = 1:10)
df2 <- data.frame(GPP = 2:8, NPP = 2:8)
df3 <- data.frame(GPP = 3:9, NPP = 3:9 )
In this scenario, the maximum timesteps is 10, and the 2nd and 3rd plants did not survive this long. To take the mean, my initial thought was to replace empty space with NA
to make the dimensions the same, such as this:
na <- matrix( , nrow = 3, ncol = 2)
colnames(na) <- c("GPP","NPP")
df2 <- rbind(df2, na)
df3 <- rbind(df3, na)
This is not desirable because the NA
does not simply ignore the value as I had hoped, but nullifies the field, leading to all outputs of arithmetic with NA
becoming NA
, such as this:
(df1 + df2 + df3) / 3
GPP NPP
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
6 7 7
7 8 8
8 NA NA
9 NA NA
10 NA NA
I can NOT just fill na
with 0s because I want to see the mean of every plant that is living at a given timestep while completely ignoring those that have died. Replacing with 0s would skew the mean, and not achieve this. For my example data here, this is the desired outcome
(df1 + df2 + df3) / 3
GPP NPP
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
6 7 7
7 8 8
8 8 8
9 9 9
10 10 10
Here rows 8-10 are replaced with the values from df1
because there are only 7 rows in both df2
and df3
.