1

How can we use dplyr group_by then assign an indices to each unique grouping, before returning the original data.frame with the grouping indices?

Example

df <- data.frame(
  user=c("Peter", "Peter", "Peter", "Paul", "Paul", "Mary", "Mary", "Mary"),
  purchase=c("Snickers", "Snickers", "Coke", "Pepsi", "Pepsi", "Snickers", "Pepsi", "Coke"),
  stringsAsFactors = FALSE
)

This works, but only because I manually hard coded the answers i.e. c(1,2,1,1,2,3)

library(dplyr)
df %>% 
  group_by(user, purchase) %>% 
  distinct() %>% 
  cbind(., c(1,2,1,1,2,3)) %>% 
  left_join(df, ., by=(c("user", "purchase")))

   user purchase ...3
1 Peter Snickers    1
2 Peter Snickers    1
3 Peter     Coke    2
4  Paul    Pepsi    1
5  Paul    Pepsi    1
6  Mary Snickers    1
7  Mary    Pepsi    2
8  Mary     Coke    3

How can we group_by, assign an indicies to each distinct group, before ungrouping so that the indices return as an additional column to the original data.frame?

stevec
  • 41,291
  • 27
  • 223
  • 311

2 Answers2

4

You can do:

df %>%
 group_by(user) %>%
 mutate(indices = cumsum(!duplicated(purchase)))

  user  purchase indices
  <chr> <chr>      <int>
1 Peter Snickers       1
2 Peter Snickers       1
3 Peter Coke           2
4 Paul  Pepsi          1
5 Paul  Pepsi          1
6 Mary  Snickers       1
7 Mary  Pepsi          2
8 Mary  Coke           3
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

can still be done in this way

df %>% 
  distinct() %>% 
  group_by(user) %>% 
  mutate(index = row_number()) %>% 
  right_join(df)

 user  purchase index
  <chr> <chr>    <int>
1 Peter Snickers     1
2 Peter Snickers     1
3 Peter Coke         2
4 Paul  Pepsi        1
5 Paul  Pepsi        1
6 Mary  Snickers     1
7 Mary  Pepsi        2
8 Mary  Coke         3
Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14