Add observation number by group in R

Question

This is a silly question but I am new to R and it would make my life so much easier if I could figure out how to do this! So here is some sample data

data <- read.table(text = "Category Y
 A 5.1
 A 3.14
 A 1.79
 A 3.21
 A 5.57
 B 3.68
 B 4.56
 B 3.32
 B 4.98
 B 5.82
 ",header = TRUE)

I want to add a column that counts the number of observations within a group. Here is what I want it to look like:

Category    Y    OBS
A          5.1    1
A          3.14   2
A          1.79   3
A          3.21   4
A          5.57   5
B          3.68   1
B          4.56   2
B          3.32   3
B          4.98   4
B          5.82   5

I have tried:

data <- data %>% group_by(Category) %>% mutate(count = c(1:length(Category)))

which just creates another column numbered from 1 to 10, and

data <- data %>% group_by(Category) %>% add_tally()

which just creates another column of all 5s

r2evans · Answer 1 · 2021-03-05T15:45:30.213

3

Base R:

data$OBS <- ave(seq_len(nrow(data)), data$Category, FUN = seq_along)
data
#    Category    Y OBS
# 1         A 5.10   1
# 2         A 3.14   2
# 3         A 1.79   3
# 4         A 3.21   4
# 5         A 5.57   5
# 6         B 3.68   1
# 7         B 4.56   2
# 8         B 3.32   3
# 9         B 4.98   4
# 10        B 5.82   5

BTW: one can use any of the frame's columns as the first argument, including ave(data$Category, data$Category, FUN=seq_along), but ave chooses its output class based on the input class, so using a string as the first argument will result in a return of strings:

ave(data$Category, data$Category, FUN = seq_along)
#  [1] "1" "2" "3" "4" "5" "1" "2" "3" "4" "5"

While not heinous, it needs to be an intentional choice. Since it appears that you wanted an integer in that column, I chose the simplest integer-in, integer-out approach. It could also have used rep(1L,nrow(data)) or anything that is both integer and the same length as the number of rows in the frame, since seq_along (the function I chose) won't otherwise care.

edited Mar 05 '21 at 15:45

answered Mar 05 '21 at 15:39

r2evans

141,215
6
77
149

1

nicely explained, upvoted – AnilGoyal Mar 05 '21 at 15:46
Would this work in cases where the categories were non-sequential? – Daniel O Mar 06 '21 at 00:35
DanielO, yes, try it! There are techniques that require the `Category` variable to be clumped together without gaps, but I typically recommend against them, preferring something that is robust to that. This, [Sathish's](https://stackoverflow.com/a/66495433/3358272), and [Anigoyal's](https://stackoverflow.com/a/66495401/3358272) answers are all robust to disorder in Category; unfortunately, `rle` is not, it finds runs (of same-ness) in `Category`, so broken groups of a category will be numbered separately, unfortunately. – r2evans Mar 06 '21 at 10:16

AnilGoyal · Answer 2 · 2021-03-06T00:26:45.290

1

library(dplyr) 
data %>% group_by(Category) %>% mutate(Obs = row_number()) 

# A tibble: 10 x 3
# Groups:   Category [2]
   Category     Y   Obs
   <chr>    <dbl> <int>
 1 A         5.1      1
 2 A         3.14     2
 3 A         1.79     3
 4 A         3.21     4
 5 A         5.57     5
 6 B         3.68     1
 7 B         4.56     2
 8 B         3.32     3
 9 B         4.98     4
10 B         5.82     5

OR

data$OBS <- ave(data$Category, data$Category, FUN = seq_along)

data
   Category    Y OBS
1         A 5.10   1
2         A 3.14   2
3         A 1.79   3
4         A 3.21   4
5         A 5.57   5
6         B 3.68   1
7         B 4.56   2
8         B 3.32   3
9         B 4.98   4
10        B 5.82   5

edited Mar 06 '21 at 00:26

answered Mar 05 '21 at 15:37

AnilGoyal

25,297
4
27
45

When I try that, I get an error Error: `n()` must only be used inside dplyr verbs. – yaynikkiprograms Mar 05 '21 at 15:39
1

@yaynikkiprograms, that suggests that the `mutate` you're using is not `dplyr::mutate`, or that you didn't use this code verbatim. (You cannot use `row_number()` outside of `mutate` or other dplyr verbs.) – r2evans Mar 05 '21 at 15:40
ave(rep(1, nrow(data)), data$Category, FUN=cumsum) – G5W Mar 05 '21 at 15:42
Your first code block is likely the most appropriate for the OP since they first demonstrated a dplyr attempt. – r2evans Mar 05 '21 at 15:51

score 1 · Answer 3 · answered Mar 05 '21 at 15:39

1

library(data.table)
setDT(data)[, OBS := seq_len(.N), by = .(Category)]
data
   Category    Y OBS
 1:        A 5.10   1
 2:        A 3.14   2
 3:        A 1.79   3
 4:        A 3.21   4
 5:        A 5.57   5
 6:        B 3.68   1
 7:        B 4.56   2
 8:        B 3.32   3
 9:        B 4.98   4
10:        B 5.82   5

answered Mar 05 '21 at 15:39

Sathish

12,453
3
41
59

1

or just `setDT(data)[, OBS = rowid(Category)]` – r2evans Mar 05 '21 at 15:41
1

@r2evans much better – Sathish Mar 05 '21 at 15:42

score 0 · Answer 4 · answered Mar 05 '21 at 15:45

0

Another base R

category <- c(rep('A',5),rep('B',5))
sequence <- sequence(rle(as.character(category))$lengths)
data <- data.frame(category=category,sequence=sequence)
head(data,10)

answered Mar 05 '21 at 15:45

reusen

491
3
11

Add observation number by group in R

4 Answers4