3

New to R and Stack Overflow. Suppose I have the following macroeconomic data loaded into a data frame called testdata in R.

> testdata
      date    gdp cpi_index rpi_index
21 2013 Q1 409985   125.067     247.4
22 2013 Q2 412620   125.971     249.7
23 2013 Q3 415577   126.352     250.9
24 2013 Q4 417265   127.123     252.5
25 2014 Q1 420091   127.241     253.9
26 2014 Q2 423249   128.139     256.0
27 2014 Q3 426022   128.191     256.9
28 2014 Q4 428347   128.312     257.4

I want to generate a new data called testdata_growth which contains the q-o-q growth rates for the macro variables in testdata. Currently my way of going about this is the following:

# Generating q-o-q growth rates
gdp_growth <- c(NA, diff(testdata$gdp)/ testdata$gdp[-1])
rpi_index_growth <- c(NA, diff(testdata$rpi_index)/ testdata$rpi_index[-1])
cpi_index_growth <- c(NA, diff(testdata$cpi_index)/ testdata$cpi_index[-1])

# Combining growth rates into a new data frame
testdata_growth <- data.frame(testdata$date, gdp_growth, rpi_index_growth, cpi_index_growth)

My question is how I can code the above into a loop, so that I can generate the new data frame with growth rates quicker (as I have dozens of macroeconomic variables that I need to apply this growth rate calculation to).

Any assistance would be greatly appreciated.

Thanks!

(Also, if you have any comments on how to improve my question, I would take these into consideration the next time I post onto Stack Overflow - many thanks!)

Edit: Added dput(testdata) below

    > dput(testdata)
structure(list(date = structure(21:28, .Label = c("2008 Q1", 
"2008 Q2", "2008 Q3", "2008 Q4", "2009 Q1", "2009 Q2", "2009 Q3", 
"2009 Q4", "2010 Q1", "2010 Q2", "2010 Q3", "2010 Q4", "2011 Q1", 
"2011 Q2", "2011 Q3", "2011 Q4", "2012 Q1", "2012 Q2", "2012 Q3", 
"2012 Q4", "2013 Q1", "2013 Q2", "2013 Q3", "2013 Q4", "2014 Q1", 
"2014 Q2", "2014 Q3", "2014 Q4"), class = "factor"), gdp = c(409985L, 
412620L, 415577L, 417265L, 420091L, 423249L, 426022L, 428347L
), cpi_index = c(125.067, 125.971, 126.352, 127.123, 127.241, 
128.139, 128.191, 128.312), rpi_index = c(247.4, 249.7, 250.9, 
252.5, 253.9, 256, 256.9, 257.4)), .Names = c("date", "gdp", 
"cpi_index", "rpi_index"), row.names = 21:28, class = "data.frame")
Johan
  • 74,508
  • 24
  • 191
  • 319
dannychan0510
  • 123
  • 2
  • 8

4 Answers4

8

You can use data.table too. data.table is a very powerful data manipulation package. You can get started here.

library("data.table")
as.data.table(testdata)[, lapply(.SD, function(x)x/shift(x) - 1), .SDcols = 2:4]


           gdp    cpi_index   rpi_index
1:          NA           NA          NA
2: 0.006427064 0.0072281257 0.009296686
3: 0.007166400 0.0030245056 0.004805767
4: 0.004061822 0.0061020008 0.006377043
5: 0.006772674 0.0009282349 0.005544554
6: 0.007517419 0.0070574736 0.008270973
7: 0.006551699 0.0004058093 0.003515625
8: 0.005457465 0.0009439040 0.001946283
Arun
  • 116,683
  • 26
  • 284
  • 387
ExperimenteR
  • 4,453
  • 1
  • 15
  • 19
  • 4
    In the devel version of `data.table`, there is `shift` with `lag`, `lead` options – akrun Mar 10 '15 at 14:54
  • @akrun Thank you for info. I've installed latest data.table. Due to some problems with Rtools, I could't install Github version. – ExperimenteR Mar 10 '15 at 15:00
  • Thank you very much. I have heard that data.table is a staple package for R users - I will use this as an opportunity to learn how this is used. One question I have - why do we need to add in `function(x)` before `x / lag(x) - 1`? I know that this is essential, as the function does not work when I take away the `function(x)`, but I do not understand the logic behind this. Any explanation would be much appreciated. Thanks! – dannychan0510 Mar 10 '15 at 15:53
  • 2
    This has to do with how `lapply` works. It is called anonymous function. Type `?lapply` or http://stackoverflow.com/a/7141669/4380497 for more information on `*apply` family. At the first sight, you might find it little peculiar. Once you figured out how it works, it will become an indispensible workhorse. Good luck! – ExperimenteR Mar 10 '15 at 16:12
  • @ExperimenteR Also, quick note - I was re-visiting this code this morning, and I found that the solution you proposed above also needs the `dplyr` package to be loaded in order to replicate the results you showedabove. Only having the `data.table` package loaded only generates 0s. Not sure why this is the case but this is what I've experienced. – dannychan0510 Mar 11 '15 at 14:08
  • 2
    @dannychan0510 That is because the `lag` is from `dplyr`. Following the @akrun's suggestion: install data.table from github, using `shift` instead of `lag` will fix it. – ExperimenteR Mar 11 '15 at 14:12
6
library(dplyr)

testdata %>%
  select(-date) %>%
  mutate_each(funs(. / lag(.) - 1))

#           gdp    cpi_index   rpi_index
# 1          NA           NA          NA
# 2 0.006427064 0.0072281257 0.009296686
# 3 0.007166400 0.0030245056 0.004805767
# 4 0.004061822 0.0061020008 0.006377043
# 5 0.006772674 0.0009282349 0.005544554
# 6 0.007517419 0.0070574736 0.008270973
# 7 0.006551699 0.0004058093 0.003515625
# 8 0.005457465 0.0009439040 0.001946283

Couldn't resist...

library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)

testdata %>%
  select(-date) %>%
  mutate_each(funs(. / lag(.) - 1)) %>%
  bind_cols(testdata[1], .) %>%
  gather(index, value, -date) %>% 
  ggplot(., aes(x = date, y = value, 
                color = factor(index), 
                group = factor(index))) + 
    geom_line() +
    scale_y_continuous(labels = percent)

Plot

JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
2

You can calculate it from the differences of the logged values.

cbind(testdata[1],sapply(testdata[-1], function(x) c(0,exp(diff(log(x)))-1)))
      date         gdp    cpi_index   rpi_index
21 2013 Q1 0.000000000 0.0000000000 0.000000000
22 2013 Q2 0.006427064 0.0072281257 0.009296686
23 2013 Q3 0.007166400 0.0030245056 0.004805767
24 2013 Q4 0.004061822 0.0061020008 0.006377043
25 2014 Q1 0.006772674 0.0009282349 0.005544554
26 2014 Q2 0.007517419 0.0070574736 0.008270973
27 2014 Q3 0.006551699 0.0004058093 0.003515625
28 2014 Q4 0.005457465 0.0009439040 0.001946283
James
  • 65,548
  • 14
  • 155
  • 193
0

A data.table solution that adds the growth columns directly to the dataset via a loop, using a new column name created in the loop (column_growth).

list.of.columns = names of the columns for which you'd like growth rates.

Remove , by=group_ID if you don't want to calculate the rates by a group.

library(data.table)

for (col in list.of.columns){
  
  growth.col.name = paste0(col, '_growth')
  
  df[,eval(growth.col.name):= get(col)/shift(get(col)) - 1, by=group_ID]
  
}