0

I have a dataframe, and I want to do some calculations depending on the previous rows (like dragging informations down in excel). My DF looks like this:

set.seed(1234)
df <- data.frame(DA = sample(1:3, 6, rep = TRUE) ,HB = sample(0:600, 6, rep = TRUE), D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE), GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0  )
df$GL[1] = 646
df$R[1] = 60
df$DA[5] = 2

df
#   DA  HB D AD GM  GL  R RM
# 1  2 399 4 13 30 646 60  0
# 2  2  97 4 10 31  NA NA  0
# 3  1 102 5  5 31  NA NA  0
# 4  3 325 4  2 31  NA NA  0
# 5  2  78 3 14 30  NA NA  0
# 6  1 269 4  8 30  NA NA  0

I want to fill out the missing values in my GL, R and RM columns, and the values are dependent on each other. So eg.

attach(df)

#calc GL and R for the 2nd row

df$GL[2] <- GL[1]+HB[2]+RM[1]

df$R[2] <- df$GL[2]*D[2]/GM[2]*AD[2]

#calc GL and R for the 3rd row

df$GL[3] <- df$GL[2]+HB[3]+df$RM[2]
df$R[3] <-df$GL[3]*D[3]/GM[3]*AD[3]

#and so on..

Is there a way to do all the calculations at once, instead of row by row?

In addition, each time the column 'DA' = 1, the previous values for 'R' should be summed up for the same row for 'RM', but only from the last occurence. So that

attach(df)

df$RM[3] <-R[1]+R[2]+R[3]

#and RM for the 6th row is calculated by

#df$RM[6] <-R[4]+R[5]+R[6]

Thanks a lot in advance!

Machavity
  • 30,841
  • 27
  • 92
  • 100
Mette
  • 315
  • 1
  • 3
  • 12
  • Better keep away from using `attach()`. – jay.sf Sep 21 '21 at 11:59
  • Thank you for your comment! Why is it better not to use attach()? – Mette Sep 22 '21 at 10:25
  • 1
    See this Q&A: https://stackoverflow.com/a/10067681/6574038, please consider to read all the answers. Good alternatives are [`with()` and `within()`](https://stackoverflow.com/a/42284422/6574038). – jay.sf Sep 22 '21 at 14:55

2 Answers2

2

You can use a for loop to calculate GL values and once you have them you can do the calculation for R columns directly.

for(i in 2:nrow(df)) {
  df$GL[i] <- with(df, GL[i-1]+HB[i]+RM[i-1])
}
df$R <- with(df, (GL* D)/(GM *AD))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you! I found out that the right solution was to use a for-loop as you suggested. I got the right result by using: for (i in 2:nrow(df)) { df$GL<- cumsum(c(df$GL[1], df$HB[-1] + df$RM[-nrow(df)])) df$R[-1] <- round(df$GL * df$Debet/ df$Gnst_dage_I_maaned * df$Antal.dage/100,2)[-1] df$RM <- ifelse(DA==1,zoo::rollsumr(df$R, k = 3, fill = 0),0) } – Mette Sep 23 '21 at 11:31
1

You can use indexing to solve the first two problems:

> # Original code from question~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> set.seed(1234)
> df <- data.frame(DA = sample(1:3, 6, rep = TRUE), HB = sample(0:600, 6, rep = TRUE),
+                  D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE),
+                  GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0  )
> df$GL[1] = 646
> df$R[1] = 60
> df$DA[5] = 2
> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> # View df
> df
  DA  HB D AD GM  GL  R RM
1  2 399 4 13 30 646 60  0
2  2  97 4 10 31  NA NA  0
3  1 102 5  5 31  NA NA  0
4  3 325 4  2 31  NA NA  0
5  2  78 3 14 30  NA NA  0
6  1 269 4  8 30  NA NA  0

> # Solution below, based on indexing
> # 1. GL column
> df$GL <- cumsum(c(df$GL[1], df$HB[-1] + df$RM[-nrow(df)]))

> # 2. R column
> df$R[-1] <- (df$GL * df$D / df$GM * df$AD)[-1]
> # May be more clear like this (same result)
> df$R[-1] <- df$GL[-1] * df$D[-1] / df$GM[-1] * df$AD[-1]
> # Or did you mean this for last *?
> df$R[-1] <- (df$GL * df$D / (df$GM * df$AD))[-1]

The third problem can be solved with a loop.

> df$RM[1] <- df$R[1]
> for (i in 2:nrow(df)) {
+   df$RM[i] <- df$R[i] + df$RM[i-1] * (df$DA[i] != 2)
+ }

> df
  DA  HB D AD GM   GL         R         RM
1  2 399 4 13 30  646 60.000000  60.000000
2  2  97 4 10 31  743  9.587097   9.587097
3  1 102 5  5 31  845 27.258065  36.845161
4  3 325 4  2 31 1170 75.483871 112.329032
5  2  78 3 14 30 1248  8.914286   8.914286
6  1 269 4  8 30 1517 25.283333  34.197619

Do these results look correct?

Update: Assuming RM should = R unless DA = 1, and in that case RM = sum of current row and previous R up to (not including) the above row with DA = 1, try the following loop.

df$RM[1] <- cs <- df$R[1]
for (i in 2:nrow(df)) {
  df$RM[i] <- df$R[i] + cs * (df$DA[i] == 1)
  cs <- cs * (df$DA[i] != 1) + df$R[i] 
}

sashahafner
  • 435
  • 1
  • 7
  • Thank you for the answer. First part worked perfectly, but I am not sure why you add (df$DA[i] != 2) at the end of the code where you calculate df$RM. I can also see that I made an error in my example, as it is each time that the DA ==1 that the R-column should be added up - so that in RM row 3 the value should be '106', and in row 6, '155'. In row 1,2,4, and 5, the value should just be '0'. I have tried to rewrite the end of the code myself, but is does not lead me to the right results. Thanks a lot in advance! – Mette Sep 22 '21 at 10:43
  • 1
    The `* (df$DA[i] != 2)` adds the cumulative sum in `RM` from above only when `DA` is not equal to 2. As I understood your original question, `RM` has the cumulative sum and is reset every time DA is 2. Now I understand that RM should = R unless DA = 1, and in that case RM = sum of current row and previous R up to (not including) the above row with DA = 1. For this I think you need a separate cumulative sum variable. I will add some code above. – sashahafner Sep 22 '21 at 11:13