R group_by one variable or (not and) another

Question

I have a dataset with two variables. As a simple example:

df <- rbind(c("A",1),c("B",2),c("C",2),c("C",3),c("D",4),c("D",5),c("E",1))

I would like to group them by the first component or the second, the desired output would be a third column with the following values:

c(1,2,2,2,3,3,1)

If I use dplyr and group_by and cur_group_id(), I would get groups by the first and second component, obtaining therefore

c(1,2,3,4,5,6,7)

Can anyone help me in an easy way, it could be either base R, dplyr or data.table, to obtain the desired group?

Thank you

score 1 · Accepted Answer · answered Feb 12 '21 at 22:42

Perhaps igraph could be a helpful tool for you

library(igraph)
df$grp <- membership(components(graph_from_data_frame(df, directed = FALSE)))[df$X1]

which gives

> df
  X1 X2 grp
1  A  1   1
2  B  2   2
3  C  2   2
4  C  3   2
5  D  4   3
6  D  5   3
7  E  1   1

Data

> dput(df)
structure(list(X1 = c("A", "B", "C", "C", "D", "D", "E"), X2 = c(1L,
2L, 2L, 3L, 4L, 5L, 1L)), row.names = c(NA, -7L), class = "data.frame")

R group_by one variable or (not and) another

1 Answers1