1

I have in R data frame that is divided to groups, like this:

Row Group
1 A
2 B
3 A
4 D
5 C
6 B
7 C
8 C
9 A
10 B

I would like to add a uniaque numeric ID to each group, so finally I would have something like this:

Row Group ID
1 A 1
2 B 2
3 A 1
4 D 4
5 C 3
6 B 2
7 C 3
8 C 3
9 A 1
10 B 2

How could I achieve this?

Thank you very much.

Nnatee
  • 31
  • 2

3 Answers3

3

We can use match on the sorted unique values of 'Group' on the 'Group' to get the position index

df1$ID <- with(df1, match(Group, sort(unique(Group))))

data

df1 <- structure(list(Row = 1:10, Group = c("A", "B", "A", "D", "C", 
"B", "C", "C", "A", "B")), class = "data.frame", row.names = c(NA, 
-10L))
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Update

group_indices() was deprecated in dplyr 1.0.0.

Please use cur_group_id() instead.

df1 <- df %>% 
  group_by(Group) %>% 
  mutate(ID = cur_group_id())

First answer:

You can use group_indices

library(dplyr)

df1 <- df %>% 
  group_by(Group) %>% 
  mutate(ID = group_indices())

data

df <- tribble(
~Row,   ~Group,
1, "A", 
2, "B", 
3, "A", 
4, "D", 
5, "C", 
6, "B", 
7, "C", 
8, "C", 
9, "A", 
10,"B")
     Row Group    ID
   <int> <chr> <int>
 1     1 A         1
 2     2 B         2
 3     3 A         1
 4     4 D         4
 5     5 C         3
 6     6 B         2
 7     7 C         3
 8     8 C         3
 9     9 A         1
10    10 B         2
TarJae
  • 72,363
  • 6
  • 19
  • 66
3

Here is a simple way.

df1$ID <- as.integer(factor(df1$Group))

There are 3 solutions posted, mine, TarJae's and akrun's, timed with increasing data sizes. akrun's is the fastest.

library(microbenchmark)
library(dplyr)
library(ggplot2)

funtest <- function(x, n){
  out <- lapply(seq_len(n), function(i){
    for(j in seq_len(i)) x <- rbind(x, x)
    cat("nrow(x):", nrow(x), "\n")
    mb <- microbenchmark(
      match = with(x, match(Group, sort(unique(Group)))),
      dplyr = x %>% group_by(Group) %>% mutate(ID = cur_group_id()),
      intfac = as.integer(factor(x$Group))
    )
    mb$n <- i
    mb
  })
  out <- do.call(rbind, out)
  aggregate(time ~ ., out, median)
}

df1 %>%
  funtest(10) %>%
  ggplot(aes(n, time, colour = expr)) +
  geom_line() +
  geom_point() +
  scale_x_continuous(breaks = 1:10, labels = 1:10) +
  scale_y_continuous(trans = "log10") +
  theme_bw()

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66