0

I want to calculate 5 year average growth rate of some variables in my database grouped by the variable "code". This means that in the first 4 years of each variable, I should have NAs. the database is downloadable here

    pwt<-structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("ABW", 
"AFG", "AGO", "AIA", "ALB", "AND", "ANT", "ARE", "ARG", "ARM", 
"ATG", "AUS", "AUT", "AZE", "BDI", "BEL", "BEN", "BFA", "BGD", 
"BGR", "BHR", "BHS", "BIH", "BLR", "BLZ", "BMU", "BOL", "BRA", 
"BRB", "BRN", "BTN", "BWA", "CAF", "CAN", "CH2", "CHE", "CHL", 
"CHN", "CIV", "CMR", "COD", "COG", "COK", "COL", "COM", "CPV", 
"CRI", "CSK", "CUB", "CUW", "CYM", "CYP", "CZE", "DEU", "DJI", 
"DMA", "DNK", "DOM", "DZA", "ECU", "EGY", "ERI", "ESP", "EST", 
"ETH", "FIN", "FJI", "FRA", "FSM", "GAB", "GBR", "GEO", "GHA", 
"GIN", "GMB", "GNB", "GNQ", "GRC", "GRD", "GRL", "GTM", "GUY", 
"HKG", "HND", "HRV", "HTI", "HUN", "IDN", "IND", "IRL", "IRN", 
"IRQ", "ISL", "ISR", "ITA", "JAM", "JOR", "JPN", "KAZ", "KEN", 
"KGZ", "KHM", "KIR", "KNA", "KOR", "KWT", "LAO", "LBN", "LBR", 
"LBY", "LCA", "LIE", "LKA", "LSO", "LTU", "LUX", "LVA", "MAC", 
"MAR", "MCO", "MDA", "MDG", "MDV", "MEX", "MHL", "MKD", "MLI", 
"MLT", "MMR", "MNE", "MNG", "MOZ", "MRT", "MSR", "MUS", "MWI", 
"MYS", "NAM", "NCL", "NER", "NGA", "NIC", "NLD", "NOR", "NPL", 
"NRU", "NZL", "OMN", "PAK", "PAN", "PER", "PHL", "PLW", "PNG", 
"POL", "PRI", "PRK", "PRT", "PRY", "PSE", "PYF", "QAT", "RKS", 
"ROU", "RUS", "RWA", "SAU", "SDN", "SEN", "SGP", "SLB", "SLE", 
"SLV", "SMR", "SOM", "SRB", "STP", "SUN", "SUR", "SVK", "SVN", 
"SWE", "SWZ", "SXM", "SYC", "SYR", "TCA", "TCD", "TGO", "THA", 
"TJK", "TKM", "TLS", "TON", "TTO", "TUN", "TUR", "TUV", "TWN", 
"TZA", "UGA", "UKR", "URY", "USA", "UZB", "VCT", "VEN", "VGB", 
"VNM", "VUT", "WSM", "YEM", "YUG", "ZAF", "ZMB", "ZWE"), class = "factor"), 
    year = c(2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 
    2007L, 2008L, 2009L, 2010L, 2000L, 2001L, 2002L, 2003L, 2004L, 
    2005L, 2006L, 2007L, 2008L, 2009L, 2010L), pop = c(0.090852998197079, 
    0.092897996306419, 0.094991996884346, 0.097016997635365, 
    0.098737001419067, 0.10003100335598, 0.100832000374794, 0.101219996809959, 
    0.101352997124195, 0.101452998816967, 0.101668998599052, 
    16.4409236907959, 16.9832668304443, 17.5726490020752, 18.203369140625, 
    18.8657169342041, 19.5525417327881, 20.2623996734619, 20.9976863861084, 
    21.7594203948975, 22.5495471954346, 23.3691310882568), rgdpe = c(4000.837890625, 
    3934.59619140625, 3882.55322265625, 3927.7529296875, 4201.69677734375, 
    4269.41748046875, 4308.62158203125, 4532.29345703125, 4572.1005859375, 
    4424.11865234375, 3971.60205078125, 37389.5859375, 37317.37109375, 
    42393.3671875, 44311.0546875, 52615.54296875, 65769.65625, 
    83384, 91420.0234375, 109108.0078125, 89716.453125, 126393.3203125
    ), rgdpo = c(3892.32348632812, 4312.86328125, 3251.35205078125, 
    3331.43383789062, 3727.60400390625, 3958.794921875, 4168.10546875, 
    4233.91845703125, 4455.19775390625, 4180.31884765625, 3767.7861328125, 
    32316.541015625, 34724.8828125, 39094.16796875, 42965.86328125, 
    51902.34375, 70721.609375, 94126.828125, 107016.71875, 132309.03125, 
    101159.71875, 139946.859375)), row.names = c(51L, 52L, 53L, 
54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 119L, 120L, 121L, 122L, 
123L, 124L, 125L, 126L, 127L, 128L, 129L), class = "data.frame")

I created the following function to get the 5 years average growth rate.

    growth5<-function(x){
    grorat<-(x/lag(x, k = 5))^(1/5)-1
    return(grorat)}

And after I used mutate from dplyr like this,

pwt <- pwt %>% group_by(code) %>% mutate(across(c(rgdpe:rgdpo), ~ growth5(.), .names = "{col}_grow"))

However, as you will see, I only get 0s in the new columns (new variables) and there are no NAs where I expected.

Very much thank you in advance!

  • 1
    Hi, please share data in the appropriate way, read [how-to-make-a-great-r-reproducible-example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). In a minimal example never all the data is needed. Also state all `library` calls you used. – jay.sf Sep 02 '20 at 19:31
  • Hm. Except for an error as `q_gdp` is not in the dataset your code works fine. The only package I loaded in my session was dplyr. – stefan Sep 02 '20 at 19:40
  • Dear stefan, very much thank you. it seems there was a conflict with lag() because I had loaded dplyr and stats at the same time and both use lag(). Now it is almost fine, but I am getting only one NA row for each first year and it is supposed to yield NA rows in the first 4 years. Indeed, the growth rates obtained are not correct. However, in the example of pieterbons it works perfect. Any hint? – Reynaldo Senra Sep 02 '20 at 20:55

2 Answers2

1

finally I got a solution excluding the creation of the function growth, so I just needed to write the following line

pwt <- pwt %>% group_by(code) %>% mutate(across(c(rgdpe:rgdpo), ~ (./lag(., 5))^(1/5)-1, .names = "{col}_grow"))

However, I still don't know why the option in my original post neither calculate the growth rates properly, nor leave 5 NA rows at the beginning of each "code" (country codes) in the database.

0

You can use the lag() function from dplyr package to achieve this. In the following example you can see that the first 4 values in the lagged vector are NA, and then the fifth entry is the result of your growth rate formula:

library(dplyr)

test <- (1:10)^2

growthrate <- function(x) { 
  (x/lag(x,5)^(1/5)-1)
   }

growthrate(test)

[1]       NA       NA       NA       NA       NA 35.00000 36.13506 40.24122 45.52228 51.53056

pieterbons
  • 1,604
  • 1
  • 11
  • 14
  • Dear pieterbons, very much thank you for your quick response. However, I am looking for average growth rate (x/x[-5])^(1/5)-1. Something like this. – Reynaldo Senra Sep 02 '20 at 19:19
  • sorry, I misunderstood. I have adjusted the answer. Not sure why your own code doesn't work since it seems very similar. Make sure you use the lag() function from dplyr, not from stats package. – pieterbons Sep 02 '20 at 19:26
  • dear pieterbons, very much thank you again. It is unbelievable, the function works well in your example but it doesn't properly calculate the average growth rates in my database. indeed, it does leave the 5 NA rows (it leaves only one NA row) – Reynaldo Senra Sep 02 '20 at 21:06