1

I want to create a column that counts how many times in a row an incident occurs. Imagine flipping a coin 100 times and you count the number of heads ("H") and tales ("T") that occur in a row:

outcome: "H","H","H","H","T","T","T","H","T","T",...

Count: 1,2,3,4,1,2,3,1,1,2...

I can achieve this by the following syntax: df$count <- sequence(rle(df$outcome)$lengths) - 0

However, Imagine that i take rounds within the 100 coin tosses. The rounds are of unequal lengths, and I need to separate between them in the count column:

Toss   Round  Outcome  Count
1      1      H        1
2      1      H        2
3      1      H        3
4      1      H        4
5      1      T        1
6      2      T        1
7      2      T        2
8      2      H        1
...

How can I implement such a distinction (i.e. group by rounds while counting within outcome) between the rounds in my code?

Edit: Just wanted to ad that this is a very simplified version of something i need to do in a 46.000 row dataset, so the solution cannot be based on the table I provide, but preferrable into the rle-code or equal.

Thanks!

Astcar
  • 11
  • 2

4 Answers4

1

Here is a solution with base R:

D <- read.table(header=TRUE, text=
"Toss   Round  Outcome  Count
1      1      H        1
2      1      H        2
3      1      H        3
4      1      H        4
5      1      T        1
6      2      T        1
7      2      T        2
8      2      H        1")

D$C <- ave(D$Toss, D$Round, D$Outcome, FUN=seq_along)
D
#   Toss Round Outcome Count C
# 1    1     1       H     1 1
# 2    2     1       H     2 2
# 3    3     1       H     3 3
# 4    4     1       H     4 4
# 5    5     1       T     1 1
# 6    6     2       T     1 1
# 7    7     2       T     2 2
# 8    8     2       H     1 1

With data.table:

library("data.table")

D <- fread(
"Toss   Round  Outcome  Count
1      1      H        1
2      1      H        2
3      1      H        3
4      1      H        4
5      1      T        1
6      2      T        1
7      2      T        2
8      2      H        1")

D[, C:=1:.N, .(Round, Outcome)][]
jogo
  • 12,469
  • 11
  • 37
  • 42
  • Thanks, but for some reason it doesn't work as it's supposed to on my real dataset. It doesn't stop counting, and so instead of restarting the count at 1 after a sequence of the oposite outcome, it continues where it left off. so I'm getting 1,2,3,4,1,2,3,5,... I really can't see why as it seems so simple. – Astcar Feb 06 '20 at 15:57
  • @Astcar Please give example data equivalent to your real data so that I can change my code. Please read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ... then edit your question. – jogo Feb 06 '20 at 18:28
1

Here is a data.table solution:

library(data.table)

DF <- data.frame(
  stringsAsFactors = FALSE,
              Toss = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L),
             Round = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L),
           Outcome = c("H", "H", "H", "H", "T", "T", "T", "H")
)

setDT(DF)

DF[, .(Toss, Count = seq_len(.N)), by = .(Round, Outcome)]

   Round Outcome Toss Count
1:     1       H    1     1
2:     1       H    2     2
3:     1       H    3     3
4:     1       H    4     4
5:     1       T    5     1
6:     2       T    6     1
7:     2       T    7     2
8:     2       H    8     1
ismirsehregal
  • 30,045
  • 5
  • 31
  • 78
1

Another data.table option which should be faster:

DT[, Count := rowid(rleid(Round, Outcome))]

output:

   Toss Round Outcome Count
1:    1     1       H     1
2:    2     1       H     2
3:    3     1       H     3
4:    4     1       H     4
5:    5     1       T     1
6:    6     2       T     1
7:    7     2       T     2
8:    8     2       H     1

data:

library(data.table)
DT <- fread("Toss   Round  Outcome  Count
1      1      H        1
2      1      H        2
3      1      H        3
4      1      H        4
5      1      T        1
6      2      T        1
7      2      T        2
8      2      H        1")
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • Oh, maybe you could help me with a follow-up question too? I need a column that gives each count-group an identity simply based on n+1, so that each time Count==1 a new Count-group is started (i.e. in the example, row 1-4 would be count-group 1, row 5=count-group 2, row 6+7= count-group 3, etc. I appreciate it! – Astcar Feb 11 '20 at 11:51
  • maybe `DT[, Count := rowid(rleid(Round, Outcome))][, CountGroup := cumsum(Count==1L)]` – chinsoon12 Feb 11 '20 at 12:22
  • Yes indeed! Thank you so much! – Astcar Feb 11 '20 at 12:39
0
library(dplyr)

df = read.table(text = "
Toss   Round  Outcome  Count
1      1      H        1
2      1      H        2
3      1      H        3
4      1      H        4
5      1      T        1
6      2      T        1
7      2      T        2
8      2      H        1
", header=T)

df %>%
  group_by(Round, Outcome) %>%
  mutate(NewCount = row_number()) %>%
  ungroup()

# # A tibble: 8 x 5
#    Toss Round Outcome Count NewCount
#   <int> <int> <fct>   <int>    <int>
# 1     1     1 H           1        1
# 2     2     1 H           2        2
# 3     3     1 H           3        3
# 4     4     1 H           4        4
# 5     5     1 T           1        1
# 6     6     2 T           1        1
# 7     7     2 T           2        2
# 8     8     2 H           1        1
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
  • When I try to use dplyr I get this error: ```Error: .onLoad failed in loadNamespace() for 'vctrs', details: call: loadNamespace(name) error: there is no package called ‘backports’ ``` – Astcar Feb 06 '20 at 12:43
  • Not sure why, but make sure your packages are all up to date. Looks like a dependency problem / conflict. – AntoniosK Feb 06 '20 at 12:45