Let's say I have a data frame like this:
> df = data.frame(id = c(2,88,4), sale1 = c(100,200,50), sale2 = c(50,150,50), sale3 = c(60, 100, 75))
> df
id sale1 sale2 sale3
1 2 100 50 60
2 88 200 150 100
3 4 50 50 75
and i want to find the percent difference between each two sales seasons for each item id. so basically:
[sale_n+1] - [sale_n])/[sale_n] for each row, where n = sales column index
and i want to do that across all the columns, and put the percent diff into a new table.
so the end result table will contain the "id" column plus N-1 columns, where N = number of columns (other than id column) in the original data frame.
for the example df i gave above, the desired output would be (all values other than id column are in percentage):
> df_diff
id sale12 sale23
1 2 -50 20
2 88 -25 -33
3 4 0 50
I've read post like the following:
- How to calculate percentage change from different rows over different spans
- Calculate percentage change in an R data frame
but the solutions provided in 1 does not scale (since i have so many columns!), and 2 (which uses lag()) seems like it's doing difference by row.
any help (hints, pointer, redirect) will be appreciated!
end note
would also greatly appreciate if you could teach me a way to smartly name the new result table's columns accordingly; like the columns will be named (in this given example): "sale12" and "sale 23"