0

I am trying to use dplyr and it's mutate and lag functions to create a burndown based on another column and the previous value in the column I'm creating.

I have seen an answer here that is asking for the same thing but the OP answered it themselves with using a loop. I am hoping there is a more eloquent solution to within dplyr.

sample (imagine df has 1000's rows):

      df <- data.frame(SCHED_SHIP_DATE = c("2019-09-10","2019-09-11","2019-09-12","2019-09-13"), quantity = c(156,52,136,285), stringsAsFactors = FALSE)

      start <- 4000

      temp3 <- df %>%
        arrange(SCHED_SHIP_DATE) %>%
        mutate(burndown = ifelse(row_number() == 1, start, NA)) %>%
        mutate(burndown = lag(burndown, default = burndown[1]) - quantity)

I get the following output (NA's after the it's done one row of mutate):

> temp3
  SCHED_SHIP_DATE quantity burndown
1      2019-09-10      156     3844
2      2019-09-11       52     3948
3      2019-09-12      136       NA
4      2019-09-13      285       NA

When I am expecting this:

> temp3
  SCHED_SHIP_DATE quantity burndown
1      2019-09-10      156     3844
2      2019-09-11       52     3948
3      2019-09-12      136     3812
4      2019-09-13      285     3527
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Kevin
  • 1,974
  • 1
  • 19
  • 51

1 Answers1

0

I think your desired output is a bit wrong. Here's what you probably need -

df %>%
  arrange(SCHED_SHIP_DATE) %>% 
  mutate(
    burndown = 4000 - cumsum(quantity)
    # burndown = pmax(4000 - cumsum(quantity), 0) # use this if you don't want -ve burndown
  )

  SCHED_SHIP_DATE quantity burndown
1      2019-09-10      156     3844
2      2019-09-11       52     3792
3      2019-09-12      136     3656
4      2019-09-13      285     3371
Shree
  • 10,835
  • 1
  • 14
  • 36
  • Thanks @Shree, you're right about my output above. I didn't even notice that it's not correct. Perfect answer! – Kevin Sep 10 '19 at 20:33