I have a data set which includes seller ID, and number of the total sold TV in each year. An example of this data frame is shown below:
ID <- c(1006332,1010660,1010852,1012960,1012960,1012960,1012960,1012960,1013515,1013515,1013515,1013515,101351)
Sold_year <- c(2017,2016,2011,2011,2012,2013,2015,2016,2014,2015,2016,2018,2019)
n <- c(1,1,1,12,1,3,4,1,2,1,3,1,2)
data <- data.frame(ID,Sold_year,n)
ID Sold_year n
1 1006332 2017 1
2 1010660 2016 1
3 1010852 2011 1
4 1012960 2011 12
5 1012960 2012 1
6 1012960 2013 3
7 1012960 2015 4
8 1012960 2016 1
9 1013515 2014 2
10 1013515 2015 1
11 1013515 2016 3
12 1013515 2018 1
13 1013515 2019 2
what I want to do: I want to calculate the average gradient of sold TV for each seller. So for example: for the seller 1013515, I want to do the following calculation:
grad1 = (1-2)/(2015-2014)=-1
grad2 = (3-1)/(2016-2015)= 2
grad3 = (1-3)/(2018-2016)=-1
grad4 = (2-1)/(2019-2018)= 1
average= -1+2-1+1/(4)=0.25
Also, for a seller who has sold a TV only in one year, I consider the gradient as 0 and will add a penalty which is the difference between the sold year and 2020. So for seller 1006332 this average gradient would be
0-(2020-2017)=-3
What I did so far: in order to do the following calculation, I used a for loop to go through a list of specific agent each time and calculate the gradient. This approach works and give the right result, however, it's slow for large datasets. I was thinking maybe there is a way to do it simply with dplyr and data frame calculations. I was thinking to use lag function and here is what I proposed:
test2 <- data %>%
group_by(ID) %>%
mutate(grad=(n-lag(n))/(Sold_year-lag(Sold_year)))
the result of this code is:
1 1006332 2017 1 NA
2 1010660 2016 1 NA
3 1010852 2011 1 NA
4 1012960 2011 12 NA
5 1012960 2012 1 -11
6 1012960 2013 3 2
7 1012960 2015 4 0.5
8 1012960 2016 1 -3
9 1013515 2014 2 NA
10 1013515 2015 1 -1
11 1013515 2016 3 2
12 1013515 2018 1 -1
13 1013515 2019 2 1
As you see, it successfully calculates the gradient for the seller with more than one working year; however, I am not sure how I should calculate the final average gradient and also how I should add a penalty to those sellers with one working year(seller : 1006332, 1010660, etc)
the expected output should be in this form:
ID Ave_grad
1 1006332 -3.00
2 1010660 -4.00
3 1010852 -9.00
4 1012960 -2.87
5 1013515 0.25
any suggestions or comments?
Thanks