-1

Assume I have the data

item   cond      foo
   1      1 3.733333
   2      1 4.766667
   3      1 4.133333
   4      2 4.466667
   5      2 2.800000
   6      2 2.300000

I need to generate a new column that uniquely identifies an item per cond value, so I'd like to get:

item   cond      foo  item_per_cond
   1      1 3.733333              1
   2      1 4.766667              2
   3      1 4.133333              3
   4      2 4.466667              1
   5      2 2.800000              2
   6      2 2.300000              3

I figured I'd go with something like this, but I have no idea what the ... should be here?

ddply(d, .(cond), transform, ...)
slhck
  • 36,575
  • 28
  • 148
  • 201
  • @Henrik Indeed, the answer is there, thanks. Although I'm specific about asking for `plyr` here. – slhck Sep 16 '14 at 09:11
  • @slhck, `plyr` is obsolete, either use `ave`, `dplyr` or `data.table` – David Arenburg Sep 16 '14 at 09:34
  • @DavidArenburg Can you point me to a reference that says `plyr` is obsolete? (Downvote because..?) – slhck Sep 16 '14 at 09:46
  • @slhck, [here](http://stackoverflow.com/questions/11533438/why-is-plyr-so-slow) for starters – David Arenburg Sep 16 '14 at 09:48
  • @DavidArenburg technically `plyr` isn't obsolete, but Hadley Wickham created `dplyr` as a [next iteration](https://github.com/hadley/dplyr) of `plyr`; although it is still possible to use `plyr`, `dplyr` is much faster and more logical with the piping possibility. See also [my answer](http://stackoverflow.com/a/25865032/2204410) for a `dplyr` solution. – Jaap Sep 16 '14 at 10:01
  • @Jaap, are you serious? Are just explaining me about `dplyr`? I'm well aware of `dplyr` and this is the exact reason why I mentioned it in my comment as a better alternative to `plyr`. There is nothing special in your answer and I provided plenty of these myself – David Arenburg Sep 16 '14 at 10:15
  • @DavidArenburg I'm sorry if you feel offended by my comment. That wasn't the purpose. I just wanted to give some more details. I'm also not claiming that my answer is special (it isn't, I've also given several of these myself). – Jaap Sep 16 '14 at 10:41

3 Answers3

2

The solution is to use seq_along with the column name:

ddply(d, .(cond), transform, item_per_cond = seq_along(item)
slhck
  • 36,575
  • 28
  • 148
  • 201
  • interesting. May be useful. thanks! – Paulo E. Cardoso Sep 16 '14 at 09:07
  • 1
    It's nice if you want to create grouped bar plots where you facet-wrap per condition and show `foo` as y value. Then you can use the new column as x-axis instead of the original one. – slhck Sep 16 '14 at 09:10
1

As you are specifically interested in a plyr solution, you might want to consider the new dplyr package of Hadley Wickham as wel:

library(dplyr)
df <- df %>% group_by(cond) %>% mutate(item_per_cond = seq_along(item))

which gives the following result:

  item cond      foo item_per_cond
1    1    1 3.733333             1
2    2    1 4.766667             2
3    3    1 4.133333             3
4    4    2 4.466667             1
5    5    2 2.800000             2
6    6    2 2.300000             3
Jaap
  • 81,064
  • 34
  • 182
  • 193
1

Here is a dplyr approach.

item <- c(1,1,2,3,5,1,2,2,2,5)
cond <- rep(c(1,2), each = 5)
value <- runif(10, 10, 20)

foo <- data.frame(item, cond, value, stringsAsFactors = F)

foo %>%
    group_by(cond) %>%
    mutate(index = dense_rank(item))

   item cond    value index
1     1    1 11.66528     1
2     1    1 18.22134     1
3     2    1 18.17833     2
4     3    1 16.58589     3
5     5    1 14.75184     4
6     1    2 11.65522     1
7     2    2 12.74313     2
8     2    2 17.17077     2
9     2    2 11.37193     2
10    5    2 12.43162     3
jazzurro
  • 23,179
  • 35
  • 66
  • 76