0

I am trying to achieve a result in R as follows: I have a dataset, like below

original_dataset

and I'm trying to achieve result as follows:

result_dataset

Conditions are:

  1. Ignore value of first row of every group in column 3, based on value of next row determine how many FALSE are in sequence. Input the number in the previous row.
  2. If the next value is TRUE input 0 and move to next value of the row and repeat the process.
  3. The last row of each group should have 0

Constraints: Do not want to use nested for loops. Is there a better way to achieve the result in R.

Thank you in advance for the suggestions/solutions.

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
PS08
  • 5
  • 2
  • 1
    Please [do not post code or data in images](https://meta.stackoverflow.com/q/285551/2372064). It's easier to help of you provide data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Sep 07 '22 at 19:19
  • Thanks @MrFlick I will be careful of that from next time. – PS08 Sep 07 '22 at 22:28

1 Answers1

0

It's much easier to help if your data is reproducible. The data in the images doesn't have column names, but the following code reproduces the data frame and names the columns col1, col2 and col3:

df <- data.frame(col1 = rep(1:17, 2),
                 col2 = rep(c("A", "B"), each = 17),
                 col3 = rep(rep(c(FALSE, TRUE), 4), 
                            times = c(2, 10, 3, 4, 5, 3, 3, 4)))

To do what you need, we can group the data according to each run of TRUE or FALSE in column 3 using rleid from data.table, then insert a reverse integer sequence for the FALSE segments. Finally, ungroup, then shift the new column up one position using lead.

library(tidyverse)

df <- df %>%
  group_by(grp = data.table::rleid(col3)) %>%
  mutate(col4 = rev(seq(n())) * (1 - col3)) %>%
  group_by(col2) %>%
  mutate(col4 = lead(col4, default = 0)) %>%
  select(-grp)

The result matches your expected output:

print(df, n = 34)
#> # A tibble: 34 x 4
#>     col1 col2  col3   col4
#>    <int> <chr> <lgl> <dbl>
#>  1     1 A     FALSE     1
#>  2     2 A     FALSE     0
#>  3     3 A     TRUE      0
#>  4     4 A     TRUE      0
#>  5     5 A     TRUE      0
#>  6     6 A     TRUE      0
#>  7     7 A     TRUE      0
#>  8     8 A     TRUE      0
#>  9     9 A     TRUE      0
#> 10    10 A     TRUE      0
#> 11    11 A     TRUE      0
#> 12    12 A     TRUE      3
#> 13    13 A     FALSE     2
#> 14    14 A     FALSE     1
#> 15    15 A     FALSE     0
#> 16    16 A     TRUE      0
#> 17    17 A     TRUE      0
#> 18     1 B     TRUE      0
#> 19     2 B     TRUE      5
#> 20     3 B     FALSE     4
#> 21     4 B     FALSE     3
#> 22     5 B     FALSE     2
#> 23     6 B     FALSE     1
#> 24     7 B     FALSE     0
#> 25     8 B     TRUE      0
#> 26     9 B     TRUE      0
#> 27    10 B     TRUE      3
#> 28    11 B     FALSE     2
#> 29    12 B     FALSE     1
#> 30    13 B     FALSE     0
#> 31    14 B     TRUE      0
#> 32    15 B     TRUE      0
#> 33    16 B     TRUE      0
#> 34    17 B     TRUE      0

Created on 2022-09-07 with reprex v2.0.2

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Hi Allan, thanks for the response and I will be careful about reproducible data from the next time. I tried the code and it works like charm, however I need to group col2 before looping col3. The results of group A should not go over group B. df <- data.frame(col1 = rep(1:17, 2), col2 = rep(c("A", "B"), each = 17), col3 = rep(rep(c(FALSE, TRUE), 5), times = c(2, 10, 3, 2, 1, 1, 5, 3, 3, 4))) At the end row of A it is TRUE but it reflects 1, because of the following FALSE value in first row of B. The ideal should be 0. – PS08 Sep 07 '22 at 22:27
  • @PS08 in that case, simply change `ungroup` to `group_by(col2)` - see my update. – Allan Cameron Sep 07 '22 at 22:48
  • Thank you so much for your help. This has worked. – PS08 Sep 07 '22 at 22:55