Add an index (or counter) to a dataframe by group in R

Question

I have a df like

ProjectID Dist
  1        x
  1        y
  2        z
  2        x
  2        h
  3        k
  ....     ....

I want to add a third column such that we have an incrementing counter for each ProjectID:

ProjectID Dist counter
  1        x     1
  1        y     2
  2        z     1
  2        x     2
  2        h     3
  1        k     3
  ....     ....

I've had a look at seq rank and a couple of other bits particularly looking to see if I could use ddply to help:

df$counter <- ddply(df,.(projectID), function(x).....? )

I think I could adapt this answer How to create a counter/numeration by group? but would prefer something using something like ddply (I can't find an equivalent of cumsum but I think that's the same principle here: Create ascending series of integers by group in Pandas ). That'd let me index occurrences in a list (and e.g. merge on this).

You could try `ave` i.e. `df$counter <- with(df, ave(seq_along(ProjectID), ProjectID, FUN=seq_along))` or a compact wrapper would be `library(splitstackshape);getanID(df, 'ProjectID')[]` or using `plyr`; `ddply(df, .(ProjectID), mutate, counter=seq_along(Dist))` — akrun, Feb 21 '15 at 16:17
Ok that works (thank you!) but I don't really understand what it's doing? (my head hurts) — sjgknight, Feb 21 '15 at 16:22
We are grouping by `ProjectID` and creating a new column as the sequence of `Dist` per each group. You will find it easy after you read the help pages and try some examples — akrun, Feb 21 '15 at 16:27
It's the use of `ave` I (think) I'm finding confusing - I get the `ddply` example (which also works perfectly, thanks again) but the use of `ave` alongside `seq_along` I'm struggling to get my head around — sjgknight, Feb 21 '15 at 16:30
In the `ave`, second argument is the grouping variable i.e. ` ave(x, ..., FUN = mean)` If you look at the description ` ...: Grouping variables, typically factors, all of the same ‘length’ as ‘x’.` . You can also use `ave(ProjectID, ProjectID, FUN=seq_along)`, but when you have `character/factor` columns, this will either result in error or get character elements as output. — akrun, Feb 21 '15 at 16:34

score 15 · Accepted Answer · answered Feb 21 '15 at 16:20

15

A dplyr solution is quite simple:

library(dplyr)

df %>% group_by(ProjectID) %>% mutate(counter = row_number(ProjectID))


#  ProjectID Dist counter
#1         1    x       1
#2         1    y       2
#3         2    z       1
#4         2    x       2
#5         2    h       3
#6         1    k       3

answered Feb 21 '15 at 16:20

jalapic

13,792
8
57
87

1

`mutate(counter=row_number())` should do it. – akrun Feb 21 '15 at 16:21
This is probably a stupid question...what's `%>%` do? (And slightly tangential, is there a way to effectively search [google] for that type of code?) – sjgknight Feb 21 '15 at 16:31
1

`%>%` is a pipe or chain operator... it works like this: `mydata %>% do_something_with_it %>% do_something_else` - it simply enables you to chain together functions. – jalapic Feb 21 '15 at 16:37

Add an index (or counter) to a dataframe by group in R

1 Answers1

Linked