0

I have a data.frame which has values pertaining to each ActivityID numbered sequentially but not starting from 1. This is because filtering happens that I cannot control. Each ActivityID contains repetitions of the same ActivityID to label each value.

e.g. df<-data.frame(value=runif(50), ActivityID=rep(c(10,18,19,20,34),each=10))

I would like to renumber ActivityID to start from 1 so that all the result looks like:

df<-data.frame(value=runif(50), ActivityID=rep(seq(1:5),each=10))

Note that the number of each reputations might vary so sometimes it might be: ActivityID=rep(seq(1:2),each=25)

My attempt with converting to factor and back to numeric works but it seems clunky. Is there a neater way of doing it?

data.frame(value=runif(25), ActivityID=as.factor(rep(c(10,18,19,20,34),each=5))) %>% mutate(ActivityID2=as.numeric(as.factor(ActivityID)))
        value ActivityID ActivityID2
1  0.17469577         10           1
2  0.74912473         10           1
3  0.47619071         10           1
4  0.65868345         10           1
5  0.46414206         10           1
6  0.95534408         18           2
7  0.05125897         18           2
8  0.72084512         18           2
9  0.13255307         18           2
10 0.34281137         18           2
11 0.82067045         19           3
12 0.08923745         19           3
13 0.47769767         19           3
14 0.11153303         19           3
15 0.98863208         19           3
16 0.03486372         20           4
17 0.37039246         20           4
18 0.01890895         20           4
19 0.58501266         20           4
20 0.10254404         20           4
21 0.78895076         34           5
22 0.85010741         34           5
23 0.37808120         34           5
24 0.70489555         34           5
25 0.83963100         34           5
HCAI
  • 2,213
  • 8
  • 33
  • 65

2 Answers2

1

You can use cur_group_id().

library(dplyr)

df %>% 
  group_by(ActivityID) %>% 
  mutate(ActivityID2 = cur_group_id()) 
1

You may use dplyr::dense_rank() as well.

library(dplyr, warn.conflicts = F)

data.frame(value=runif(25), ActivityID=as.factor(rep(c(10,18,19,20,34),each=5))) %>% 
  mutate(ID2 = dense_rank(ActivityID))
#>         value ActivityID ID2
#> 1  0.04445442         10   1
#> 2  0.07533451         10   1
#> 3  0.46675858         10   1
#> 4  0.35555307         10   1
#> 5  0.99833030         10   1
#> 6  0.48128773         18   2
#> 7  0.51598496         18   2
#> 8  0.10621133         18   2
#> 9  0.18349920         18   2
#> 10 0.29088374         18   2
#> 11 0.03232032         19   3
#> 12 0.56884196         19   3
#> 13 0.56391102         19   3
#> 14 0.68882695         19   3
#> 15 0.44887127         19   3
#> 16 0.53528115         20   4
#> 17 0.67460873         20   4
#> 18 0.75139184         20   4
#> 19 0.66499921         20   4
#> 20 0.98203906         20   4
#> 21 0.46494209         34   5
#> 22 0.34140739         34   5
#> 23 0.99652580         34   5
#> 24 0.31101698         34   5
#> 25 0.84440767         34   5

Created on 2021-07-07 by the reprex package (v2.0.0)

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45