0

Below is a dataframe (toy example) that I would like to transform so that group becomes 1, 1, 2, 2, 3, 3, 3.

  group       y
  C           -1.55461160
  C           0.34945015
  A           0.57210825
  A           -0.88019528
  H           0.03307085
  H           1.13494754
  H          -1.65146164

My current solution is to count the number of groups and the number of records per group, and to recreate the group variable using these two pieces, i.e.

ngroups   <- length(unique(df$group))
npergroup <- aggregate(x = rep(1, nrow(df)), by = list(df$group), FUN = sum)$x

df <- df %>%
  mutate(group = rep(1:ngroups, npergroup))

For the sake of elegance, do you have a fully dplyr solution?

Marco
  • 9,334
  • 7
  • 33
  • 51
  • check our `forcats::fct_recode()`, even though it is not `dplyr` it is still part of the `tidyverse` – MrNetherlands May 24 '19 at 07:01
  • 1
    would this work? `df %>% mutate(group = as.integer(factor(group, levels = unique(group))))` Do you want to give a unique id to each `group` or want to replace C with 1, A with 2 and H with 3 ? – Ronak Shah May 24 '19 at 07:02
  • @RonakShah: I want to a unique id, it does not matter if 1 corresponds to A or C. Thanks for your suggestion. – Marco May 24 '19 at 07:10

2 Answers2

4

One possibility could be:

df %>%
 mutate(group2 = cumsum(!duplicated(group))) 

  group           y group2
1     C -1.55461160      1
2     C  0.34945015      1
3     A  0.57210825      2
4     A -0.88019528      2
5     H  0.03307085      3
6     H  1.13494754      3
7     H -1.65146164      3

Or you can use a rleid()-like function:

df %>%
 mutate(group2 = with(rle(group), rep(seq_along(lengths), lengths)))

If you want to just assign unique IDs to "group":

df %>%
 mutate(group2 = group_indices(., group))

  group           y group2
1     C -1.55461160      2
2     C  0.34945015      2
3     A  0.57210825      1
4     A -0.88019528      1
5     H  0.03307085      3
6     H  1.13494754      3
7     H -1.65146164      3
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

This is not fully dplyr but quite nice.

library(data.table)
library(dplyr)

df%>%
  mutate(group = rleid(group))

  group           y
1     1 -1.55461160
2     1  0.34945015
3     2  0.57210825
4     2 -0.88019528
5     3  0.03307085
6     3  1.13494754
7     3 -1.65146164
Humpelstielzchen
  • 6,126
  • 3
  • 14
  • 34