0

I'm trying to calculate w/w growth rates entirely in R. I could use excel, or preprocess with ruby, but that's not the point.

data.frame example

        date   gpv        type
1 2013-04-01 12900 back office
2 2013-04-02 16232 back office
3 2013-04-03  10035 back office

I want to do this factored by 'type' and I need to wrap up the Date type column into weeks. And then calculate the week over week growth.

I think I need to do ddply to group by week - with a custom function that determines if a date is in a given week or not?

Then, after that, use diff and find the growth b/w weeks divided by the previous week.

Then I'll plot week/week growths, or use a data.frame to export it.

This was closed but had same useful ideas.

Community
  • 1
  • 1
jawspeak
  • 855
  • 9
  • 17

2 Answers2

0

UPDATE: answer with ggplot:

All the same as below, just use this instead of plot

ggplot(data.frame(week=seq(length(gr)), gr), aes(x=week,y=gr*100)) + geom_point() + geom_smooth(method='loess') + coord_cartesian(xlim = c(.95, 10.05)) + scale_x_discrete() + ggtitle('week over week growth rate, from Apr 1') + ylab('growth rate %')

(old, correct answer but using only plot)

Well, I think this is it:

df_net <- ddply(df_all, .(date), summarise, gpv=sum(gpv))  # df_all has my daily data.
df_net$week_num <- strftime(df_net$date, "%U") #get the week # to 'group by' in ddply
df_weekly <- ddply(df_net, .(week_num), summarize, gpv=sum(gov))

gr <- diff(df_weekly$gpv)/df_weekly$gpv[-length(df_weekly$gpv)]  #seems correct, but this I don't understand via: http://stackoverflow.com/questions/15356121/how-to-identify-the-virality-growth-rate-in-time-series-data-using-r
plot(gr, type='l', xlab='week #', ylab='growth rate percent', main='Week/Week Growth Rate')

Any better solutions out there?

Jitesh Dalsaniya
  • 1,917
  • 3
  • 20
  • 36
jawspeak
  • 855
  • 9
  • 17
0

For the last part, if you want to calculate the growth rate you can take logs and then use diff, with the default parameters lag = 1 (previos week) and difference = 1 (first difference):

df_weekly_log <- log(df_weekly)
gr <- diff(df_weekly_log , lag = 1, differences = 1)

The later is an approximation, valid for small differences.

Hope it helps.