Adding an ID or index column for plyr's subsets

Question

Assume I have the data

item   cond      foo
   1      1 3.733333
   2      1 4.766667
   3      1 4.133333
   4      2 4.466667
   5      2 2.800000
   6      2 2.300000

I need to generate a new column that uniquely identifies an item per cond value, so I'd like to get:

item   cond      foo  item_per_cond
   1      1 3.733333              1
   2      1 4.766667              2
   3      1 4.133333              3
   4      2 4.466667              1
   5      2 2.800000              2
   6      2 2.300000              3

I figured I'd go with something like this, but I have no idea what the ... should be here?

ddply(d, .(cond), transform, ...)

@Henrik Indeed, the answer is there, thanks. Although I'm specific about asking for `plyr` here. — slhck, Sep 16 '14 at 09:11
@slhck, `plyr` is obsolete, either use `ave`, `dplyr` or `data.table` — David Arenburg, Sep 16 '14 at 09:34
@DavidArenburg Can you point me to a reference that says `plyr` is obsolete? (Downvote because..?) — slhck, Sep 16 '14 at 09:46
@slhck, [here](http://stackoverflow.com/questions/11533438/why-is-plyr-so-slow) for starters — David Arenburg, Sep 16 '14 at 09:48
@DavidArenburg technically `plyr` isn't obsolete, but Hadley Wickham created `dplyr` as a [next iteration](https://github.com/hadley/dplyr) of `plyr`; although it is still possible to use `plyr`, `dplyr` is much faster and more logical with the piping possibility. See also [my answer](http://stackoverflow.com/a/25865032/2204410) for a `dplyr` solution. — Jaap, Sep 16 '14 at 10:01
@Jaap, are you serious? Are just explaining me about `dplyr`? I'm well aware of `dplyr` and this is the exact reason why I mentioned it in my comment as a better alternative to `plyr`. There is nothing special in your answer and I provided plenty of these myself — David Arenburg, Sep 16 '14 at 10:15
@DavidArenburg I'm sorry if you feel offended by my comment. That wasn't the purpose. I just wanted to give some more details. I'm also not claiming that my answer is special (it isn't, I've also given several of these myself). — Jaap, Sep 16 '14 at 10:41

score 2 · Accepted Answer · answered Sep 16 '14 at 09:04

2

The solution is to use seq_along with the column name:

ddply(d, .(cond), transform, item_per_cond = seq_along(item)

answered Sep 16 '14 at 09:04

slhck

36,575
28
148
201

interesting. May be useful. thanks! – Paulo E. Cardoso Sep 16 '14 at 09:07
1

It's nice if you want to create grouped bar plots where you facet-wrap per condition and show `foo` as y value. Then you can use the new column as x-axis instead of the original one. – slhck Sep 16 '14 at 09:10

Jaap · Answer 2 · 2014-09-16T09:44:42.710

As you are specifically interested in a plyr solution, you might want to consider the new dplyr package of Hadley Wickham as wel:

library(dplyr)
df <- df %>% group_by(cond) %>% mutate(item_per_cond = seq_along(item))

which gives the following result:

  item cond      foo item_per_cond
1    1    1 3.733333             1
2    2    1 4.766667             2
3    3    1 4.133333             3
4    4    2 4.466667             1
5    5    2 2.800000             2
6    6    2 2.300000             3

score 1 · Answer 3 · answered Sep 16 '14 at 09:21

Here is a dplyr approach.

item <- c(1,1,2,3,5,1,2,2,2,5)
cond <- rep(c(1,2), each = 5)
value <- runif(10, 10, 20)

foo <- data.frame(item, cond, value, stringsAsFactors = F)

foo %>%
    group_by(cond) %>%
    mutate(index = dense_rank(item))

   item cond    value index
1     1    1 11.66528     1
2     1    1 18.22134     1
3     2    1 18.17833     2
4     3    1 16.58589     3
5     5    1 14.75184     4
6     1    2 11.65522     1
7     2    2 12.74313     2
8     2    2 17.17077     2
9     2    2 11.37193     2
10    5    2 12.43162     3

+1 dense_rank is new for me. nice. – Paulo E. Cardoso Sep 16 '14 at 09:28 — Paulo E. Cardoso, Sep 16 '14 at 09:28

Adding an ID or index column for plyr's subsets

3 Answers3