1

How can I number rows in a sorted data frame consecutively, starting at 1 whenever a new id begins?

What I have:

id | value
a | 2
a | 6
a | 1
a | 10
a | 12
b | 5
b | 2
b | 3
...

What I want:

id | value | t
a | 2 | 1
a | 6 | 2
a | 1 | 3
a | 10 | 4
a | 12 | 5
b | 5 | 1
b | 2 | 2
b | 3 | 3
...
moabit21
  • 639
  • 8
  • 20
  • possible duplicate of [input sequential numbers without specific end in a data frame's column in r](http://stackoverflow.com/questions/23007982/input-sequential-numbers-without-specific-end-in-a-data-frames-column-in-r) – A5C1D2H2I1M1N2O1R2T1 Apr 11 '14 at 10:26

2 Answers2

3
DF <- read.table(text="id | value
a | 2
a | 6
a | 1
a | 10
a | 12
b | 5
b | 2
b | 3", sep="|", header=TRUE)

DF$t <- sequence(rle(as.character(DF$id))$lengths)
#   id value t
# 1 a      2 1
# 2 a      6 2
# 3 a      1 3
# 4 a     10 4
# 5 a     12 5
# 6 b      5 1
# 7 b      2 2
# 8 b      3 3
Roland
  • 127,288
  • 10
  • 191
  • 288
  • `data.table`, `dplyr`, `tapply` and `ave` alternatives [**here**](http://stackoverflow.com/questions/22288462/replace-loop-with-an-pply-alternative/22288615#22288615). – Henrik Apr 11 '14 at 10:41
  • @Henrik `sequence` and `rle` should be very efficient. So, I'm not sure when you'd use one of these alternatives. – Roland Apr 11 '14 at 10:43
  • @Roland, Thanks for your comment. Sorry if I phrased it in a way that may have suggested that your alternative was slow(er). That was not my intention. I referred (sloppily) only to the comparison among alternatives _within_ the answer I pointed to. I have edited my comment. – Henrik Apr 11 '14 at 10:47
  • @Henrik That's because you benchmark `as.character` with that. If you make sure that the column is not a factor and don't need `as.character` the timings are different. `data.table` will be faster though. – Roland Apr 11 '14 at 11:13
  • @Henrik It does on my system. It takes only one third of the time needed with `ave`. – Roland Apr 11 '14 at 11:21
  • It's nice to see all of these alternatives, but how large of a dataset do we have to have to get to get out of millisecond timings? – A5C1D2H2I1M1N2O1R2T1 Apr 11 '14 at 16:25
0

You can use this -

gr_index    <- as.numeric(table(df$id))
df$gr_index <- unlist(lapply(gr_index, seq_len))

I have found this faster the ddply or split commands, especially on large data sets.

RHelp
  • 815
  • 2
  • 8
  • 23