How to calculate percentage change from different rows over different spans

Question

I am trying to calculate the percentage change in price for quarterly data of companies recognized by a gvkey(1001, 1384, etc...). and it's corresponding quarterly stock price, PRCCQ.

    gvkey  PRCCQ
1   1004 23.750
2   1004 13.875
3   1004 11.250
4   1004 10.375
5   1004 13.600
6   1004 14.000
7   1004 17.060
8   1004  8.150
9   1004  7.400
10  1004 11.440
11  1004  6.200
12  1004  5.500
13  1004  4.450
14  1004  4.500
15  1004  8.010

What I am trying to do is add 8 columns showing 1 quarter return, 2 quarter return, etc. all the way to 8 quarters. I have been able to calculate 1 quarter return for each PRCCQ by using the delt function of quantmod and ddply of plyr, and I was also able to get the 2 quarter return using the same code by altering k.

ddply(data, "gvkey", transform,  DeltaCol = Delt(PRCCQ,k=2))

However, this equation will NOT allow me to go higher than k=2 without giving me an error of differing number of rows 2,3. I've tried using many alternate methods now but dint work. Is there a function I can plug into the ddply code I have to replace Delt or maybe another completely alternative line of code to display all 8 quarters of return in individual columns?

Rcoster · Accepted Answer · 2013-02-15T19:19:07.443

You can declare your data as ts() and use cbind() and diff()

data <- read.table(header=T,text='gvkey  PRCCQ
   1004 23.750
   1004 13.875
   1004 11.250
   1004 10.375
   1004 13.600
   1004 14.000
   1004 17.060
   1005  8.150
   1005  7.400
  1005 11.440
  1005  6.200
  1005  5.500
  1005  4.450
  1005  4.500
  1005  8.010')

data <- split(data,list(data$gvkey))
(newdata <- do.call(rbind,lapply(data,function(data) { data <- ts(data) ; cbind(data,Quarter=diff(data[,2]),Two.Quarter=diff(data[,2],2))})))

      data.gvkey data.PRCCQ Quarter Two.Quarter
 [1,]       1004     23.750      NA          NA
 [2,]       1004     13.875  -9.875          NA
 [3,]       1004     11.250  -2.625     -12.500
 [4,]       1004     10.375  -0.875      -3.500
 [5,]       1004     13.600   3.225       2.350
 [6,]       1004     14.000   0.400       3.625
 [7,]       1004     17.060   3.060       3.460
 [8,]       1005      8.150      NA          NA
 [9,]       1005      7.400  -0.750          NA
[10,]       1005     11.440   4.040       3.290
[11,]       1005      6.200  -5.240      -1.200
[12,]       1005      5.500  -0.700      -5.940
[13,]       1005      4.450  -1.050      -1.750
[14,]       1005      4.500   0.050      -1.000
[15,]       1005      8.010   3.510       3.560

EDIT:

Another way, without split() and lapply() (probably faster)

data <- read.table(header=T,text='gvkey  PRCCQ
       1004 23.750
       1004 13.875
       1004 11.250
       1004 10.375
       1004 13.600
       1004 14.000
       1004 17.060
       1005  8.150
       1005  7.400
      1005 11.440
      1005  6.200
      1005  5.500
      1005  4.450
      1005  4.500
      1005  8.010')
newdata <- do.call(rbind,by(data, data$gvkey,function(data) { data <- ts(data) ; cbind(data,Quarter=diff(data[,2]),Two.Quarter=diff(data[,2],2))}))

That works wonders on most of the data! However, when I execute the command for the entire data file, I'm getting: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 319311, 319310, 319309 — user2076502, Feb 15 '13 at 18:43
Just did and it worked for the 1004, however, the percentage changes are still calculated between gvkeys except for the beginning gvkey(1004). It should reset and be NA for however many periods like it is in the example. It's almost there though. — user2076502, Feb 15 '13 at 18:56
just ran your alternate (faster) code and got the error: Error in ts(x) : 'ts' object must have one or more observations — user2076502, Feb 15 '13 at 20:26
Finally got the second one to work, but I'm running into a problem with memory. It gives me an error saying there is not enough memory to allocate the vector. Is there a way I can run the second code and have it either output pieces of the process in dropbox or maybe by looping? — user2076502, Feb 16 '13 at 01:25

score 0 · Answer 2 · answered Feb 15 '13 at 18:44

df <- read.table(text="gvkey  PRCCQ
1  1004  5.500
2  1004  4.450
3  1004  4.500
4  1004  8.010
5  1005  4.450
6  1005  4.500",header=TRUE)

library(plyr)
library(quantmod)
ddply(df, "gvkey", transform, DeltaCol = Delt(PRCCQ,k=3))
#error

Delt2 <- function(x,k) {
  if(length(x)>k) as.vector(Delt(x1=x,k=k)) else rep(NA,length(x))
}

ddply(df, "gvkey", transform, DeltaCol = Delt2(PRCCQ,k=3))
#  gvkey PRCCQ  DeltaCol
#1  1004  5.50        NA
#2  1004  4.45        NA
#3  1004  4.50        NA
#4  1004  8.01 0.4563636
#5  1005  4.45        NA
#6  1005  4.50        NA

whenever I execute your final command, I get the error: Error in if (empty(.data)) return(.data) : missing value where TRUE/FALSE needed — user2076502, Feb 15 '13 at 19:01

How to calculate percentage change from different rows over different spans

2 Answers2

Linked