1. dplyr/tidyr
It may be better to convert the 'wide' format to 'long' format. We could use dplyr/tidyr
to get the mean
. Create a 'ind' column, reshape the data to 'long' using gather
, split the 'variable' column into two columns ('var1', 'var2') with extract
, group by 'ind', get the mean
values of the 'value' column after subsetting it based on the different logical index created (i.e. var2 < startyear
, var2 >= startyear & var2 <= endyear
, and var2 >endyear
)
library(dplyr)
library(tidyr)
dS <- df %>%
mutate(ind=row_number()) %>%
gather(variable, value, starts_with('y')) %>%
extract(variable, c('var1', 'var2'), '([^0-9]+)([0-9]+)',
convert=TRUE) %>%
group_by(ind) %>%
summarise(before_mean= mean(value[var2 < startyear]),
within_mean = mean(value[var2 >= startyear &
var2 <= endyear]),
after_mean=mean(value[var2 >endyear])) %>%
as.data.frame()
nm1 <- paste(c('before', 'within', 'after'), 'mean', sep="_")
dS
# ind before_mean within_mean after_mean
#1 1 629.6667 44.0 65.0
#2 2 636.0000 57.2 1179.4
We can create additional columns in 'df' from the above output
df[nm1] <- dS
2. base R
We can use base R
methods and without changing the format of the dataset. From the original dataset ('df'), make an index ('indx') of numeric column names, remove the non-numeric part and convert to numeric ('v1').
indx <- grep('\\d+', names(df))
v1 <- as.numeric(sub('[^0-9]+', '', names(df)[indx]))
Loop the rows of 'df' (lapply
), match
the 'startyear' with 'v1', use that index ('i1') to get the columns, unlist
, and calculate the mean
. The same can be done by matching the 'endyear' with 'v1' to get the index ('i2'). Based on 'i1', and 'i2', calculate the 'within_mean' and 'after_mean'. rbind
the list elements and assign the output to new columns ('nm1') in 'df'.
df[nm1] <- do.call(rbind,lapply(1:nrow(df), function(i) {
i1 <- match(df$startyear[i], v1)
before_mean<- mean(unlist(df[i,1:(i1-1),drop=FALSE]))
i2 <- match(df$endyear[i], v1)
within_mean <- mean(unlist(df[i,i2:i1]))
after_mean <- mean(unlist(df[i,match(v1[(i2+1):length(v1)],v1)]))
data.frame(before_mean,within_mean, after_mean) }))
df[nm1]
# before_mean within_mean after_mean
#1 629.6667 44.0 65.0
#2 636.0000 57.2 1179.4