11

I have a df like

ProjectID Dist
  1        x
  1        y
  2        z
  2        x
  2        h
  3        k
  ....     ....

I want to add a third column such that we have an incrementing counter for each ProjectID:

ProjectID Dist counter
  1        x     1
  1        y     2
  2        z     1
  2        x     2
  2        h     3
  1        k     3
  ....     ....

I've had a look at seq rank and a couple of other bits particularly looking to see if I could use ddply to help:

df$counter <- ddply(df,.(projectID), function(x).....? )

I think I could adapt this answer How to create a counter/numeration by group? but would prefer something using something like ddply (I can't find an equivalent of cumsum but I think that's the same principle here: Create ascending series of integers by group in Pandas ). That'd let me index occurrences in a list (and e.g. merge on this).

Community
  • 1
  • 1
sjgknight
  • 393
  • 1
  • 5
  • 19
  • 3
    You could try `ave` i.e. `df$counter <- with(df, ave(seq_along(ProjectID), ProjectID, FUN=seq_along))` or a compact wrapper would be `library(splitstackshape);getanID(df, 'ProjectID')[]` or using `plyr`; `ddply(df, .(ProjectID), mutate, counter=seq_along(Dist))` – akrun Feb 21 '15 at 16:17
  • Ok that works (thank you!) but I don't really understand what it's doing? (my head hurts) – sjgknight Feb 21 '15 at 16:22
  • We are grouping by `ProjectID` and creating a new column as the sequence of `Dist` per each group. You will find it easy after you read the help pages and try some examples – akrun Feb 21 '15 at 16:27
  • It's the use of `ave` I (think) I'm finding confusing - I get the `ddply` example (which also works perfectly, thanks again) but the use of `ave` alongside `seq_along` I'm struggling to get my head around – sjgknight Feb 21 '15 at 16:30
  • In the `ave`, second argument is the grouping variable i.e. ` ave(x, ..., FUN = mean)` If you look at the description ` ...: Grouping variables, typically factors, all of the same ‘length’ as ‘x’.` . You can also use `ave(ProjectID, ProjectID, FUN=seq_along)`, but when you have `character/factor` columns, this will either result in error or get character elements as output. – akrun Feb 21 '15 at 16:34

1 Answers1

15

A dplyr solution is quite simple:

library(dplyr)

df %>% group_by(ProjectID) %>% mutate(counter = row_number(ProjectID))


#  ProjectID Dist counter
#1         1    x       1
#2         1    y       2
#3         2    z       1
#4         2    x       2
#5         2    h       3
#6         1    k       3
jalapic
  • 13,792
  • 8
  • 57
  • 87
  • 1
    `mutate(counter=row_number())` should do it. – akrun Feb 21 '15 at 16:21
  • This is probably a stupid question...what's `%>%` do? (And slightly tangential, is there a way to effectively search [google] for that type of code?) – sjgknight Feb 21 '15 at 16:31
  • 1
    `%>%` is a pipe or chain operator... it works like this: `mydata %>% do_something_with_it %>% do_something_else` - it simply enables you to chain together functions. – jalapic Feb 21 '15 at 16:37