How to take the average over multiple years in panel data in R?

Question

My dataset looks as follows (the full dataset has all the countries of the world in it):

structure(list(Country = c("Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Albania"), CountryCode = c("AFG", 
"AFG", "AFG", "AFG", "AFG", "ALB"), Time = c("2017", "2018", 
"2019", "2020", "2021", "2017"), Time.Code = c("YR2017", "YR2018", 
"YR2019", "YR2020", "YR2021", "YR2017"), gdp_pc_g = c(0.0647641950115201, 
-1.19490038335884, 1.53563667424, -4.57503186436271, NA, 3.89774060601475
), gdp_pc = c(2058.40022106979, 2033.80438893717, 2065.03623501599, 
1970.56016925338, NA, 12770.9642910053), population_g = c(2.54783303994568, 
2.38430907940751, 2.31307334941778, 2.30381209309418, 2.30341213305193, 
-0.0919722937442495), savings = c(NA, NA, NA, NA, NA, 16.5293641587755
), inflation = c(2.40365618844285, 2.07134857890898, 6.52147984507825, 
7.82166712677619, NA, 1.45106269924993), unemployment = c(11.1800003051758, 
11.1520004272461, 11.2170000076294, 11.710000038147, 13.2829999923706, 
13.6199998855591), crime = c(6.6784014408596, 6.65556115177472, 
NA, NA, NA, 2.01097785878705), health = c(0.64313394, 0.54922014, 
1.08443093, NA, NA, 2.67507172), health_new = c(2.31911564, 1.89807308, 
3.87310839, NA, NA, 9.17743587), co2 = c(4780.00020980835, 6070.00017166138, 
6079.99992370605, NA, NA, 5139.9998664856), debt_wb = c(NA, NA, 
NA, NA, NA, 75.8887515969161), debt_imf = c(7.9976602, 7.3844129, 
6.1304099, 7.3974947, NA, 70.11991), qualityofgovernment = c(-1.60108568271001, 
-1.59558055798213, -1.56942588090897, -1.67851787805557, -1.68115723133087, 
0.0151268268624942), migration = c(-47.09, -47.205, -8.082, 166.821, 
-183.672, -9.768), gdp_pc_log = c(7.62968436836638, 7.61766340135312, 
7.63290305246442, 7.58607313118708, NA, 9.45492945839223), gdp_pc_1996 = c(NA, 
NA, NA, NA, NA, 4908), gdp_pc_1996_log = c(NA, NA, NA, NA, NA, 
8.4986218058308), unemployment_log = c(2.41412649502354, 2.41161889232933, 
2.41743048534325, 2.46044318086727, 2.58648502157599, 2.61153929231895
), crime_log = c(1.89887865364994, 1.8954527686191, NA, NA, NA, 
0.69862110067871), health_new_log = c(0.841185923351801, 0.640839203134581, 
1.35405738619833, NA, NA, 2.21674784862268), co2_log = c(8.47219586937846, 
8.71111391233384, 8.71275996241187, NA, NA, 8.54480833247365), 
    migration_t5 = c(105.905, 48.076, 255.611, -281.739, -90.238, 
    -19.946), debt_imf_1996 = c(NA, NA, NA, NA, NA, 34.69009), 
    debt_imf_1996_log = c(NA, NA, NA, NA, NA, 3.5464540553859
    ), debt_imf_2001 = c(NA, NA, NA, NA, NA, 60.609729), debt_imf_2006 = c(22.984644, 
    22.984644, 22.984644, 22.984644, 22.984644, 56.690318), debt_imf_2011 = c(7.5010844, 
    7.5010844, 7.5010844, 7.5010844, 7.5010844, 59.399812), debt_imf_2016 = c(8.4372661, 
    8.4372661, 8.4372661, 8.4372661, 8.4372661, 72.380049), politics_1996 = c(-1.94518780708313, 
    -1.94518780708313, -1.94518780708313, -1.94518780708313, 
    -1.94518780708313, -0.621049637595813), politics_2006 = c(-1.63495093584061, 
    -1.63495093584061, -1.63495093584061, -1.63495093584061, 
    -1.63495093584061, -0.443528113265832), politics_2016 = c(-1.55767250061035, 
    -1.55767250061035, -1.55767250061035, -1.55767250061035, 
    -1.55767250061035, 0.00307970804472781), politics_t5 = c(0.943790654341378, 
    0.931954483191173, 1.02754203478495, 1.0176876783371, 0.94232749938965, 
    2.25874653831124), politics_2001 = c(NA, NA, NA, NA, NA, 
    NA), politics_2011 = c(-1.72906082868576, -1.72906082868576, 
    -1.72906082868576, -1.72906082868576, -1.72906082868576, 
    -0.213528143862883), politics_t10 = c(-1.68152981996536, 
    -1.76271136601766, -1.78180730342865, -1.74940039714178, 
    -1.72906082868576, -0.309388402849436), debt_imf_2001_log = c(NA, 
    NA, NA, NA, NA, 4.10445542474383), debt_imf_2006_log = c(3.13482634077627, 
    3.13482634077627, 3.13482634077627, 3.13482634077627, 3.13482634077627, 
    4.03760343777541), debt_imf_2011_log = c(2.01504759675729, 
    2.01504759675729, 2.01504759675729, 2.01504759675729, 2.01504759675729, 
    4.08429106138043), debt_imf_2016_log = c(2.13265833433292, 
    2.13265833433292, 2.13265833433292, 2.13265833433292, 2.13265833433292, 
    4.28193069511796), debt_imf_log = c(2.07914902390068, 1.9993714138387, 
    1.81326161557553, 2.00114138883371, NA, 4.25020677653823), 
    gdp_pc_t5 = c(2075.49161435331, 2116.46525771251, 2102.38460375974, 
    2068.26590413364, 2057.06797755329, 11227.9663308153), population_g_t5 = c(3.40760003260595, 
    3.49459192739625, 3.35560233082199, 3.07705362914441, 2.77803478841236, 
    -0.165151040121679), savings_t5 = c(NA, NA, NA, NA, NA, 19.6251036733499
    ), inflation_t5 = c(7.30175647136598, 4.82278547931647, 0.566944540816536, 
    2.44756300204637, -2.19752646590604, 1.04271460643408), unemployment_t5 = c(11.3409996032715, 
    11.1929998397827, 11.1420001983643, 11.1269998550415, 11.1579999923706, 
    13.3800001144409), crime_t5 = c(6.25132816655284, NA, NA, 
    9.78392178232544, 6.5511634561067, 5.38761486858166), health_new_t5 = c(1.37000275, 
    1.77506948, 1.8606739, 2.01440573, 2.14110541, 8.69735146
    ), co2_t5 = c(8079.99992370605, 5989.99977111816, 4880.00011444092, 
    5949.99980926514, 5300.00019073486, 4360.0001335144), debt_imf_t5 = c(6.7594679, 
    6.9044792, 8.700291, 9.1540829, 8.4372661, 62.119953), qualityofgovernment_t5 = c(-1.55620934565862, 
    -1.56804551680883, -1.47245796521505, -1.4823123216629, -1.55767250061035, 
    -0.241253461688757), gdp_pc_t5_log = c(7.63795332718298, 
    7.65750264485461, 7.65082750508984, 7.63446580769799, 7.62903693604581, 
    9.32616293879677), inflation_t5_log = c(1.98811493172768, 
    1.57335166141558, -0.5674937916493, 0.895092836478912, NA, 
    0.0418275109957932), unemployment_t5_log = c(2.4284244428514, 
    2.41528856859307, 2.41072176940192, 2.40937457414793, 2.41215672873958, 
    2.59376106325321), crime_t5_log = c(1.83279394782042, NA, 
    NA, 2.28074040390539, 1.87964266075989, 1.68410277651703), 
    health_new_t5_log = c(0.314812747137289, 0.573839565823258, 
    0.620938733933986, 0.70032422882336, 0.761322242398395, 2.1630185494273
    ), co2_t5_log = c(8.99714714207282, 8.69784665289884, 8.4929005222982, 
    8.6911464664834, 8.57546213552792, 8.38022736696565), debt_imf_t5_log = c(1.91094417393909, 
    1.93217036041067, 2.16335647337702, 2.21419999834076, 2.13265833433292, 
    4.12906724168007), politics = c(0.898914317289989, 0.904419442017872, 
    0.930574119091033, 0.821482121944428, 0.818842768669128, 
    2.51512682686249), politics_t5_log = c(-0.0578509018988914, 
    -0.0704713032668603, 0.0271695763178487, 0.0175330717798688, 
    -0.0594024009312679, 0.814810030354778), migration_rate = c(-1.32, 
    -1.286, -0.214, 4.289, -4.569, -3.387), migration_rate_t5 = c(3.481, 
    1.525, 7.842, -8.31, -2.602, -6.872), inflation_new = c(1.02403656188443, 
    1.02071348578909, 1.06521479845078, 1.07821667126776, NA, 
    1.0145106269925), inflation_new_t5 = c(1.07301756471366, 
    1.04822785479316, 1.00566944540816, 1.02447563002046, 0.97802473534094, 
    1.01042714606434), inflation_new_t5_log = c(0.0704748332384, 
    0.0471009809625991, 0.00565343458901919, 0.0241809012205975, 
    -0.0222203175074989, 0.0103731583444112)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Now I would like to take the average of for example population_g for each country and place this in a new column in the same dataset.

Can someone help me to create this new column which will have the averages over the 5 years of population_g for each country?

So I would like to get something as:

Afghanistan 2017 (all previous variables) (new variable which is the average of population_g over 2017 to 2021 for Afghanistan)

Afghanistan 2018 (all previous variables) (new variable which is the average of population_g over 2017 to 2021 for Afghanistan)

Afghanistan 2019 (all previous variables) (new variable which is the average of population_g over 2017 to 2021 for Afghanistan)

Afghanistan 2020 (all previous variables) (new variable which is the average of population_g over 2017 to 2021 for Afghanistan)

Afghanistan 2021 (all previous variables) (new variable which is the average of population_g over 2017 to 2021 for Afghanistan)

If using `dplyr`, try `df %>% group_by(Country) %>% mutate(ave = mean(population_g, na.rm = TRUE)) %>% ungroup()` — Darren Tsai, Dec 29 '22 at 13:15
If `base`, then `transform(df, ave = ave(population_g, Country, FUN = \(x) mean(x, na.rm = TRUE)))` — Darren Tsai, Dec 29 '22 at 13:17
Thank you for the response! What does the na.rm do again? Why not na.rm = FALSE? — MateoTilburg, Dec 29 '22 at 13:22
The above ways add a new column to the original data, so, in the different years of the same country the averages are equal. If you want one average for one country, then see [Aggregate / summarize multiple variables per group (e.g. sum, mean)](https://stackoverflow.com/q/9723208/10068985). — Darren Tsai, Dec 29 '22 at 13:25
As for `na.rm`, please see the document of `mean()`. Type `?mean` in your console and press enter. — Darren Tsai, Dec 29 '22 at 13:27

score 1 · Answer 1 · answered Jan 03 '23 at 12:19

I think this is what you want. I make use of package plm's panel data functions to get the mean per country. Continuing with your data and calling it dat:

library("plm")
pdat <- pdata.frame(dat, index = c("CountryCode", "Time"))
pdat$avg_population_g <- plm::Between(pdat$population_g)
pdat[ , c("CountryCode", "Time", "population_g", "avg_population_g")]
##          CountryCode Time population_g avg_population_g
## AFG-2017         AFG 2017   2.54783304       2.37048794
## AFG-2018         AFG 2018   2.38430908       2.37048794
## AFG-2019         AFG 2019   2.31307335       2.37048794
## AFG-2020         AFG 2020   2.30381209       2.37048794
## AFG-2021         AFG 2021   2.30341213       2.37048794
## ALB-2017         ALB 2017  -0.09197229      -0.09197229

How to take the average over multiple years in panel data in R?

1 Answers1