0

The data frame has three columns:

year = c(2000,2000,2001,2001,2001,2002,2002,2002)   
sales = c(100,104,106,108,112,115,121, 143) 
category = c("Auto", "Personal", "Joint", "Auto", "Personal", "Joint", "Auto", "Personal")  
df = data.frame(year, sales, category)

I used the below code to find year over year growth by category but the output adds a new column that doesn't have correct values of the growth or decline percentage. Maybe this is because my code is not accounting for different categories. Any idea on how to do that in R? In SQL you could group by but I tried adding that in the above code in R and that gave me a new column with 0 in all rows?!

setDT(df)[, new.col := Sales/shift(Sales) - 1]

LMc
  • 12,577
  • 3
  • 31
  • 43

2 Answers2

0

It seems that you use data.table, which I am not too familiar with, but here is how you would do it, using the library tidyverse. See here for some discussion on the pros and cons of both libraries. You forgot to add the year vector in your example, so I made one up.

library(tidyverse)

sales = c(100,104,106,108,112,115,121, 143) 
category = c("Auto", "Personal", "Joint", "Auto", "Personal", "Joint", "Auto", "Personal")  
year <- c('2018','2018','2018','2019','2019','2019','2020','2020')
df = data.frame(year, sales, category)

new_df <- df %>%
  group_by(category) %>%
  mutate(growth_percentage = sales/lag(sales,n = 1, order_by = year))

Datapumpernickel
  • 606
  • 6
  • 14
0

You forgot to group by category in setDT(df)[, new.col := Sales/shift(Sales) - 1].

You can do as follows (using data.table syntax):

library(data.table)

year <- c(2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002)   
sales <- c(100, 104, 106, 108, 112, 115, 121, 143) 
category <- c("Auto", "Personal", "Joint", "Auto", "Personal", "Joint", "Auto", "Personal")  
dt <- data.table(year, sales, category)
dt[, growth_rate := sales / shift(sales) - 1, by = category]

Also note that :

  • you can use the constructor data.table directly (as I did) instead of using data.frame followed by setDT
  • The data.table that you built is already sorted on the year. If it was not the case, you would add the instruction setorder(dt, year) before computing the year over year growth rate.
Etienne Kintzler
  • 672
  • 6
  • 12