R: renaming levels of a factor using dplyr

Question

Below is a dataframe (toy example) that I would like to transform so that group becomes 1, 1, 2, 2, 3, 3, 3.

  group       y
  C           -1.55461160
  C           0.34945015
  A           0.57210825
  A           -0.88019528
  H           0.03307085
  H           1.13494754
  H          -1.65146164

My current solution is to count the number of groups and the number of records per group, and to recreate the group variable using these two pieces, i.e.

ngroups   <- length(unique(df$group))
npergroup <- aggregate(x = rep(1, nrow(df)), by = list(df$group), FUN = sum)$x

df <- df %>%
  mutate(group = rep(1:ngroups, npergroup))

For the sake of elegance, do you have a fully dplyr solution?

check our `forcats::fct_recode()`, even though it is not `dplyr` it is still part of the `tidyverse` — MrNetherlands, May 24 '19 at 07:01
would this work? `df %>% mutate(group = as.integer(factor(group, levels = unique(group))))` Do you want to give a unique id to each `group` or want to replace C with 1, A with 2 and H with 3 ? — Ronak Shah, May 24 '19 at 07:02
@RonakShah: I want to a unique id, it does not matter if 1 corresponds to A or C. Thanks for your suggestion. — Marco, May 24 '19 at 07:10

tmfmnk · Accepted Answer · 2019-05-24T07:14:10.160

4

One possibility could be:

df %>%
 mutate(group2 = cumsum(!duplicated(group))) 

  group           y group2
1     C -1.55461160      1
2     C  0.34945015      1
3     A  0.57210825      2
4     A -0.88019528      2
5     H  0.03307085      3
6     H  1.13494754      3
7     H -1.65146164      3

Or you can use a rleid()-like function:

df %>%
 mutate(group2 = with(rle(group), rep(seq_along(lengths), lengths)))

If you want to just assign unique IDs to "group":

df %>%
 mutate(group2 = group_indices(., group))

  group           y group2
1     C -1.55461160      2
2     C  0.34945015      2
3     A  0.57210825      1
4     A -0.88019528      1
5     H  0.03307085      3
6     H  1.13494754      3
7     H -1.65146164      3

edited May 24 '19 at 07:14

answered May 24 '19 at 07:01

tmfmnk

38,881
4
47
67

1

Thank you ! I did not know about group_indices() :-) – Marco May 24 '19 at 07:14
1

It's quite a handy function :) – tmfmnk May 24 '19 at 07:14
1

Nice `rle` solution. Try `rep(seq(ls <- rle(group)$lengths), ls)`. – jay.sf May 24 '19 at 07:32

score 1 · Answer 2 · answered May 24 '19 at 07:00

1

This is not fully dplyr but quite nice.

library(data.table)
library(dplyr)

df%>%
  mutate(group = rleid(group))

  group           y
1     1 -1.55461160
2     1  0.34945015
3     2  0.57210825
4     2 -0.88019528
5     3  0.03307085
6     3  1.13494754
7     3 -1.65146164

answered May 24 '19 at 07:00

Humpelstielzchen

6,126
3
14
34

Thank you for your nice solution (+1). But I have to admit I like group_indices a lot! – Marco May 24 '19 at 07:21
It's awesome, I didn't know it either! – Humpelstielzchen May 24 '19 at 07:23

R: renaming levels of a factor using dplyr

2 Answers2