1

I would like run a cumulative summary on two columns in a dataframe that look like this:

    A   B
    1   1
    1   2
    1   3
    1   4
    1   5
    2   6
    2   7
    2   8
    3   9
    3   10

And have the final dataframe to look like this:

    A   B
    1   1
    1   2
    1   3
    1   4
    1   5
    2   1
    2   2
    2   3
    3   1
    3   2

Essentially what I am as asking is how can I run a cumsum with a condition that every time col A changes, reset to 1 and start running the cumsum again?

Thanks!

  • 1
    It isn't a cumsum you are after. It is just restarting a sequence from 1 (by 1) each time. See [this question and my answer](http://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame/12925090#12925090) – mnel Feb 06 '13 at 03:44

1 Answers1

3

R has a very useful but very badly named function called rle, which stands for "run length encoding", that happens to do exactly what you want.

x <- read.table(text=" A   B
    1   1
    1   2
    1   3
    1   4
    1   5
    2   6
    2   7
    2   8
    3   9
    3   10", header=TRUE)

x_rle <- rle(x$A)
x$new_col <- unlist(sapply(x_rle$lengths, function(x) {return(1:x)}))

Result:

> x
   A  B new_col
1  1  1       1
2  1  2       2
3  1  3       3
4  1  4       4
5  1  5       5
6  2  6       1
7  2  7       2
8  2  8       3
9  3  9       1
10 3 10       2
Marius
  • 58,213
  • 16
  • 107
  • 105