Let me first give you an idea of how the data looks like:
Customer Value Module SubModule ModuleTF month department newCust
1 5 M1 SM1 1 1 DEP1 0
1 3 M1 SM1 1 2 DEP1 0
1 8 M1 SM1 1 3 DEP1 0
1 4 M2 SM1 1 1 DEP2 0
1 5 M2 SM2 1 1 DEP2 0
1 45 A5 null 0 1 DEP2 0
2
...
What I would like to do is to calculate a slope for VALUE of MONTH where it would be a new column in df. The problem is that It would need to be calculated for every module, sub module, and department. Not calculated if newCust = 0. The thing is also that sometimes values for X month are null and therefore not present in the dataset. I would like these null values to be included as they obviously affect the slope. What is more, Modules sometimes do not have a submodule and calculation should be done in this case as well. Would it be necessary to enter those null values so all Modules and Sub Modules have equal number of entries?
I would like the outcome to look sth like this
Customer Value Module SubModule ModuleTF month department newCust slope
1 5 M1 SM1 1 1 DEP1 0 1.2
1 3 M1 SM1 1 2 DEP1 0 1.2
1 8 M1 SM1 1 3 DEP1 0 1.2
1 4 M2 SM1 1 1 DEP2 0 1.35
1 5 M2 SM2 1 1 DEP2 0 1.11
1 45 A5 null 0 1 DEP2 0 0.23
2
...
Any help will be more than appreciated!
Thanks!