1

This is not a duplicate of this question. Please read questions entirely before labeling duplicates.

I have a data.frame like so:

library(tidyverse)

tibble(
  color = c("blue", "blue", "red", "green", "purple"),
  shape = c("triangle", "square", "circle", "hexagon", "hexagon")
)

  color  shape   
  <chr>  <chr>   
1 blue   triangle
2 blue   square  
3 red    circle  
4 green  hexagon 
5 purple hexagon 

I'd like to add a group_id column like this:

  color  shape    group_id
  <chr>  <chr>       <dbl>
1 blue   triangle        1
2 blue   square          1
3 red    circle          2
4 green  hexagon         3
5 purple hexagon         3

The difficulty is that I want to group by unique values of color or shape. I suspect the solution might be to use list-columns, but I can't figure out how.

John J.
  • 1,450
  • 1
  • 13
  • 28
  • 1
    Thanks, @akrun. Your answer is very helpful. I really thought other users would have high enough reading comprehension to recognize that this is a different issue than the basic group_by issue linked to in the duplicate report. – John J. Dec 15 '20 at 16:54
  • Making sure people understand exactly what you're asking is one reason why it's helpful to see what you've tried. That could show what your approach to the question is, even if your code doesn't work – camille Dec 16 '20 at 00:41
  • 2
    Just as an aside, you can [@ notify gold badge holders](https://meta.stackexchange.com/questions/43019/how-do-comment-replies-work) that unilaterally close questions as duplicates in the comments of the question. Unless they are following your question (unlikely), editing your question will not notify them. I agree with you that this is not a duplicate of the target, so I voted to reopen. – Ian Campbell Dec 16 '20 at 03:13

1 Answers1

2

We can use duplicated in base R

df1$group_id <- cumsum(!Reduce(`|`, lapply(df1, duplicated)))

-output

df1
# A tibble: 5 x 3
#  color  shape    group_id
#  <chr>  <chr>       <int>
#1 blue   triangle        1
#2 blue   square          1
#3 red    circle          2
#4 green  hexagon         3
#5 purple hexagon         3

Or using tidyverse

library(dplyr)
library(purrr)
df1 %>%
    mutate(group_id = map(.,  duplicated) %>%
                         reduce(`|`) %>%
                         `!` %>% 
                       cumsum)

data

df1 <- structure(list(color = c("blue", "blue", "red", "green", "purple"
), shape = c("triangle", "square", "circle", "hexagon", "hexagon"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    FYI look at [this question](https://stackoverflow.com/questions/25345244/how-to-use-logical-operator-with-magrittr-in-r) for further discussion of negation `!` with the pipe `%>%` – qdread Dec 15 '20 at 16:56