3

I have a df like this:

A      B
0      0
0      0
0      0
0      1
0      1
0      2
0      3
0      3
1      0 
1      0 
1      1
1      1
2      0
2      1
2      2

I need a new column C with an iterator which counts the number of occurences of value in column B.

This is what exactly I need:

    A      B   C
    0      0   1
    0      0   2
    0      0   3
    0      1   1
    0      1   2
    0      2   1
    0      3   1
    0      3   2
    1      0   1 
    1      0   2
    1      1   1
    1      1   2
    2      0   1
    2      1   1
    2      2   1

First 3 rows of C are 1-2-3 beacause in B we have 3 rows with value 0, then 2 rows of C with 1-2 beacause we have two rows with value 1 in B, etc...

I tried with something like this:

 DF$C <- ifelse(DF$B == 0 , 1:length(DF),1:length(DF))

But actually it doesn't work with more value than 0, and can't control it quite well. I need some for loop that checks col B and create col C iterating it.

Hope the question is clear. Thank you in advance.

Sotos
  • 51,121
  • 6
  • 32
  • 66
Luigi
  • 133
  • 9
  • 1
    Does this answer your question? [Numbering rows within groups in a data frame](https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame) – Andrew Mar 04 '20 at 13:38
  • 1
    Not an exact dupe. There is no grouping variable here – Sotos Mar 04 '20 at 13:40

3 Answers3

3

You can use run length encoding (rle) to get the lengths of consecutive matches, then just seq each length in an lapply before unlisting it.

DF$C <- unlist(lapply(rle(DF$B)$lengths, seq))

DF
#>    A B C
#> 1  0 0 1
#> 2  0 0 2
#> 3  0 0 3
#> 4  0 1 1
#> 5  0 1 2
#> 6  0 2 1
#> 7  0 3 1
#> 8  0 3 2
#> 9  1 0 1
#> 10 1 0 2
#> 11 1 1 1
#> 12 1 1 2
#> 13 2 0 1
#> 14 2 1 1
#> 15 2 2 1

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
3

We can create groups based on the diff not being 0 (i.e. values are the same) and use those groups to create sequences, i.e.

i1 <- cumsum(c(TRUE, diff(df$B) != 0))
ave(i1, i1, FUN = seq_along)
#[1] 1 2 3 1 2 1 1 2 1 2 1 2 1 1 1

However, If your groups are based on both columns (you do not mention anything about column A), then we don't have to create the groups manually. We can just use both columns for grouping, i.e.

with(df, ave(A, A, B, FUN = seq_along))
#[1] 1 2 3 1 2 1 1 2 1 2 1 2 1 1 1
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

With data.table, we can use rleid with rowid

library(data.table)
setDT(DF)[, C := rowid(rleid(B))]
DF
#    A B C
# 1: 0 0 1
# 2: 0 0 2
# 3: 0 0 3
# 4: 0 1 1
# 5: 0 1 2
# 6: 0 2 1
# 7: 0 3 1
# 8: 0 3 2
# 9: 1 0 1
#10: 1 0 2
#11: 1 1 1
#12: 1 1 2
#13: 2 0 1
#14: 2 1 1
#15: 2 2 1

data

DF <- structure(list(A = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L), B = c(0L, 0L, 0L, 1L, 1L, 2L, 3L, 3L, 0L, 
0L, 1L, 1L, 0L, 1L, 2L)), class = "data.frame", row.names = c(NA, 
-15L))
akrun
  • 874,273
  • 37
  • 540
  • 662