0

I have a panel dataset where I want to average over a specified number of time periods (t) by variable (column).

An example:

Country    Year    Var 1        Var 2       Var 3
Austria   1984      1           3.6          95
Austria   1985      2           4.1          94.6
Austria   1986      1           2.6          93.6
Austria   1987      1            3           94.4  
Austria   1988      1           3.9          95.2 

What I want then is a new column/new dataframe with a new variable for the average for the 5 year period (1984-1988) for Var 1, a variable for the average of Var 2 and var 3 etc.

I also want to loop the function over such that I can apply it to the other countries in my dataset. It would be great if I could avoid that the averaging mixes up countries, so I was thinking of adding some matching string pattern (for code %in% AUT in this case for instance, I have a variable with country codes) but I couldn't figure out how to do it.

Thank you very much in advance

Bastje
  • 37
  • 6
  • Might want to add a tag for what programming language you are working with. – Ryan Wilson Feb 16 '18 at 16:51
  • Thanks, working in R, added the tag – Bastje Feb 16 '18 at 17:09
  • 1
    You should extract the country and year into separate columns, then this becomes a simpler problem. It would be easiest if you just use `cut` to bin the years and then use whichever [mean by group FAQ solution you like](https://stackoverflow.com/q/11562656/903061). `dplyr` makes it especially easy to apply the mean-by-group to multiple columns with `summarize_at`. – Gregor Thomas Feb 16 '18 at 17:12

1 Answers1

0

1) Using the sample input in the Note at the end, read in the country and year from the row names and round the year up to the end of the current 5 year period so that each year from 1984 to 1988 gets rounded up to 1988, etc. Then use aggregate to calculate the means of each column by both country and year. No packages are used.

By0 <- read.table(text = rownames(DF), col.names = c("Country", "Year"))
By <- transform(By0, Year = 5 * ((Year - min(Year)) %/% 5) + min(Year) + 4)

aggregate(DF, By, mean)

giving the following:

    Country Year Var 1 Var 2 Var 3
1 Australia 1988   1.6 18.46 95.52
2   Austria 1988   1.2  3.44 94.56

2) or if what was wanted was to append the columns to the original data frame lapply over the columns using ave to take the mean by Country for each:

out <- cbind(DF, lapply(DF, function(x) with(By, ave(x, Country, Year, FUN = mean))))
names(out) <- c(names(DF), paste("Mean", names(DF)))

giving:

> out
               Var 1 Var 2 Var 3 Mean Var 1 Mean Var 2 Mean Var 3
Austria 1984       1   3.6  95.0        1.2       3.44      94.56
Austria 1985       2   4.1  94.6        1.2       3.44      94.56
Austria 1986       1   2.6  93.6        1.2       3.44      94.56
Austria 1987       1   3.0  94.4        1.2       3.44      94.56
Austria 1988       1   3.9  95.2        1.2       3.44      94.56
Australia 1984     1   3.6  95.0        1.6      18.46      95.52
Australia 1985     2   4.1  94.6        1.6      18.46      95.52
Australia 1986     1   2.6  93.6        1.6      18.46      95.52
Australia 1987     1   3.0  94.4        1.6      18.46      95.52
Australia 1988     3  79.0 100.0        1.6      18.46      95.52

Note

The input used, shown reproducibly, is:

Lines <- "
Var 1,Var 2,Var 3
Austria 1984,1,3.6,95
Austria 1985,2,4.1,94.6
Austria 1986,1,2.6,93.6
Austria 1987,1,3,94.4
Austria 1988,1,3.9,95.2
Australia 1984,1,3.6,95
Australia 1985,2,4.1,94.6
Australia 1986,1,2.6,93.6
Australia 1987,1,3,94.4
Australia 1988,3,79,100"

DF <- read.csv(text = Lines, check.names = FALSE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you, I'm almost there. My data is over a 30 year period, and I want 5 year averages. So I need to have the mean for Austria 1984-1988, then for Austria 1989-1993, Austria 1994-1998 ... Australia 1984-1988 ... etc Any chance you could share with me me how I specify that in your aggregate function? – Bastje Feb 16 '18 at 22:10
  • Have added that to (1) – G. Grothendieck Feb 17 '18 at 00:53
  • Thank you very much! I got what I needed with the transform and aggregate function. – Bastje Feb 18 '18 at 20:58