0

Say that I have the following vector:

dat <- c(1,0,-1,1,0,-1,1,0,1)

I want a vector that counts the occurences of 1, 0 an -1 in dat but as an ongoing tally. The solution would look like so:

tally <- c(1,1,1,2,2,2,3,3,4)

So essentially my new vector has an ongoing tally of 1, 0 and -1 from dat. I am looking for a way to do this calculation in R so I can use it on a much larger set.

Chris95
  • 75
  • 1
  • 10
  • This isn't well defined. Are the values in `dat` always to be taken in chunks of three? How do you know which values in `tally` are counting which values? Or do you mean that `tally` should be read positionally? – joran Dec 20 '17 at 20:48
  • 2
    The very first answer here works fine with vectors: [Numbering rows within groups in a data frame](https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame) – Henrik Dec 20 '17 at 20:51
  • @joran Apologies, the only possible values in dat will be in {1, 0 ,-1} but they will not be bunched in threes (i.e the distribution of the {1, 0, -1} in dat is random). And yes, I mean that tally should be read positionally. – Chris95 Dec 20 '17 at 21:00

2 Answers2

1

Here is a fairly simple approach:

> dat <- c(1,0,-1,1,0,-1,1,0,1)
> tally <- ave(dat, factor(dat), FUN=seq_along)
> tally
[1] 1 1 1 2 2 2 3 3 4

The ave function splits the dat vector apart by the unique values in dat (-1, 0, and 1 in this case), then seq_along is a quick and dirty way to get the running tally for each unique value, then ave puts the separate cumulative counts back together in the order to match the original data.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • Thank you, this is the type of answer I was looking for. EDIT: ran it on a very big vector and it had excellent run time! – Chris95 Dec 20 '17 at 21:00
  • I have quick follow up, would you know how to calculate the consecutive run length of each value? i.e for `dat <- c(1,1,1,0,-1, -1)` we would have `run <- c(3,3,3,1,2,2)` because there are three 1s in a row, one 0 in a row and two -1s in a row. So basically just having a running tally of the consecutive run length, – Chris95 Dec 20 '17 at 21:17
  • 1
    @Chris95, `tmp <- rle(dat); rep(tmp$lengths, each=tmp$lengths)` – Greg Snow Dec 20 '17 at 21:39
1
dat <- c(1,0,-1,1,0,-1,1,0,1)

new_vec <- NULL
count_this <- function(vec) {
    for(i in 1:length(vec)) {
    this_elem = vec[i]
    before_vec <- vec[1:i]
    contains_vec <- before_vec[before_vec == this_elem]
    new_vec[i] <- length(contains_vec)
    }
    return(new_vec)
}

Use like this:

count_this(dat)

1 1 1 2 2 2 3 3 4

But definitely use Greg's much more efficient approach:

dat_long <- round(rnorm(10000), 0)

start.time <- Sys.time()
res_a <- count_this(dat_long)
end.time <- Sys.time()
time.taken <- end.time - start.time
p_1 <- as.vector(time.taken)

start.time <- Sys.time()
res_b <- ave(dat_long, factor(dat_long), FUN=seq_along)
end.time <- Sys.time()
time.taken <- end.time - start.time
p_2 <- as.vector(time.taken)

final <- data.frame(For_Loop = p_1, Vectorized = p_2)
mp <- barplot(as.matrix(final), col='steelblue', beside=T, main='Runtimes for Tally Algoritm')

enter image description here

Cybernetic
  • 12,628
  • 16
  • 93
  • 132