2

I have a column of pdf values and a conditional column. I am attempting to create a third column that forward fills in values from the pdf column based on the conditional column. If the condition is TRUE then I would like the corresponding row to restart the pdf column from the beginning.

I've seen this question posted R: fill new columns in data.frame based on row values by condition? and it is close but I would like a dplyr solution to retain my pipe structure.

Very Simple Example Data:

library(tidyverse)
dat <- tibble(pdf = c(.025, .05, .10, .15, .175, .20, .29, .01),
              cond = c(F, F, T, F, F, F, T, F),
              expected = c(.025, .05, .025, .05, .10, .15, .025, .05))

The expected is seen in the dataframe above. (Note that I don't see the expected column)

Thank you in advance.

jackbio
  • 131
  • 7

1 Answers1

4

Here's a way by creating a reference using ave.

The output of cumsum(cond) produces a grouping and ave uses this grouping and creates a sequence along each group using seq_along. This sequence is then used as reference for pulling the appropriate pdf value.

dat %>% 
  mutate(
    ref = ave(pdf, cumsum(cond), FUN = seq_along),
    expected2 = pdf[ref]
  )

# A tibble: 8 x 5
    pdf cond  expected   ref expected2
  <dbl> <lgl>    <dbl> <dbl>     <dbl>
1 0.025 FALSE    0.025     1     0.025
2 0.05  FALSE    0.05      2     0.05 
3 0.1   TRUE     0.025     1     0.025
4 0.15  FALSE    0.05      2     0.05 
5 0.175 FALSE    0.1       3     0.1  
6 0.2   FALSE    0.15      4     0.15 
7 0.290 TRUE     0.025     1     0.025
8 0.01  FALSE    0.05      2     0.05 
Shree
  • 10,835
  • 1
  • 14
  • 36
  • 1
    Thanks for the answer. Could you possibly give an explanation of what `ave` is doing? I've looked up the help file and confused how it is exactly calculating the sequence of `1, 2, 1, 2, 3...` – jackbio Aug 27 '19 at 21:45
  • 1
    @jackbio Added some explanation. Also see my answer about [ave](https://stackoverflow.com/questions/57463615/what-is-the-difference-between-ave-function-and-mean-function-in-r/57464057#57464057) for more info. – Shree Aug 27 '19 at 21:51
  • Very smart solution but I don't see a reason to use `dplyr` here. Why not keep it in base R only ? `with(dat, pdf[ave(pdf, cumsum(cond), FUN = seq_along)])` – Ronak Shah Aug 28 '19 at 00:02
  • @RonakShah I used `dplyr` looking at OP's use of `tidyverse` and `tibble`. Nothing wrong with base R approach though. Also, I used separate `ref` column just to make the logic a bit transparent. :) – Shree Aug 28 '19 at 00:10