Enumerating replicates in a data frame

Question

I have this example data.frame:

df <- data.frame(id = rep(letters[1:10],2), sub.id = c(rep("P",10), rep("M",10)), group = rep(c(rep("X", 7), rep("Y", 3)),2), class = rep(c(rep("A1", 5), rep("A2", 5)),2))

> df
   id sub.id group class
1   a      P     X    A1
2   b      P     X    A1
3   c      P     X    A1
4   d      P     X    A1
5   e      P     X    A1
6   f      P     X    A2
7   g      P     X    A2
8   h      P     Y    A2
9   i      P     Y    A2
10  j      P     Y    A2
11  a      M     X    A1
12  b      M     X    A1
13  c      M     X    A1
14  d      M     X    A1
15  e      M     X    A1
16  f      M     X    A2
17  g      M     X    A2
18  h      M     Y    A2
19  i      M     Y    A2
20  j      M     Y    A2

df$id appears twice for its corresponding respective df$sub.id "P" and df$sub.id "M".

I would like to add a column which enumerates the replicates of the different group and class combinations and respects the df$id values. The resulting data.frame would therefore be:

> df
   id sub.id group class replicate
1   a      P     X    A1         1
2   b      P     X    A1         2
3   c      P     X    A1         3
4   d      P     X    A1         4
5   e      P     X    A1         5
6   f      P     X    A2         6
7   g      P     X    A2         7
8   h      P     Y    A2         1
9   i      P     Y    A2         2
10  j      P     Y    A2         3
11  a      M     X    A1         1
12  b      M     X    A1         2
13  c      M     X    A1         3
14  d      M     X    A1         4
15  e      M     X    A1         5
16  f      M     X    A2         6
17  g      M     X    A2         7
18  h      M     Y    A2         1
19  i      M     Y    A2         2
20  j      M     Y    A2         3

Duplicates: http://stackoverflow.com/questions/6150968/adding-an-repeated-index-for-factors/6151333#6151333 http://stackoverflow.com/questions/6162685/how-can-i-rank-observations-in-group-faster http://stackoverflow.com/questions/19848362/adding-a-counter-column-for-a-set-of-similar-rows-in-r — thelatemail, Sep 14 '14 at 23:28

eipi10 · Answer 1 · 2014-09-14T23:24:16.150

You can do this with the dplyr package as follows:

library(dplyr)
df = df %>%
  group_by(group, sub.id) %>%
  mutate(replicate=1:length(id))

> df
Source: local data frame [20 x 5]
Groups: group, sub.id

   id sub.id group class replicate
1   a      P     X    A1         1
2   b      P     X    A1         2
3   c      P     X    A1         3
4   d      P     X    A1         4
5   e      P     X    A1         5
6   f      P     X    A2         6
7   g      P     X    A2         7
8   h      P     Y    A2         1
9   i      P     Y    A2         2
10  j      P     Y    A2         3
11  a      M     X    A1         1
12  b      M     X    A1         2
13  c      M     X    A1         3
14  d      M     X    A1         4
15  e      M     X    A1         5
16  f      M     X    A2         6
17  g      M     X    A2         7
18  h      M     Y    A2         1
19  i      M     Y    A2         2
20  j      M     Y    A2         3

dplyr also has the built-in function n(), which you can use instead of length() as follows:

df = df %>%
  group_by(group, sub.id) %>%
  mutate(replicate=1:n())

n() automatically counts the number of rows in each combination of the grouping variables (in this case group and sub.id).

In your example data frame, id was already in alphabetical order. If this isn't the case and if it's important to have the numerical order of replicate correspond to the alphabetical order of id, then you can sort the data frame first, using the arrange function. In the example below, the data frame is sorted first by group, then by sub.id, then by id:

df = df %>%
  arrange(group, sub.id, id) %>%
  group_by(group, sub.id) %>%
  mutate(replicate=1:n())

Enumerating replicates in a data frame

1 Answers1