1

I have a dataframe that looks (way more observations for days and uid though) like this:

   day   uid
1    1 0zOs6
2    2 0zOs6
3    3 0zOs6
4    4 0zOs6
5    1 3jtMi
6    2 3jtMi
7    3 3jtMi
8    1 5mJSn
9    2 5mJSn
10   3 5mJSn
11   1 dD8ro
12   2 dD8ro

I want to create a new, variable based on uid - basically a new id that starts at 1 and increases by 1 everytime we have a new id in the row uid, like this:

   day   uid newid
1    1 0zOs6     1
2    2 0zOs6     1
3    3 0zOs6     1
4    4 0zOs6     1
5    1 3jtMi     2
6    2 3jtMi     2
7    3 3jtMi     2
8    1 5mJSn     3
9    2 5mJSn     3
10   3 5mJSn     3
11   1 dD8ro     4
12   2 dD8ro     4

How can I achieve this?

Oliver
  • 39
  • 3

2 Answers2

2

In base R, we can use match and it would be very fast and efficient. There is no need for a loop

df1$newid <- with(df1, match(uid, unique(uid)))

Or use factor and coerce to integer

df1$newid <- with(df1, as.integer(factor(uid, levels = unique(uid))))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

A data.table option using .GRP

> setDT(df)[, newid := .GRP, uid][]
    day   uid newid
 1:   1 0zOs6     1
 2:   2 0zOs6     1
 3:   3 0zOs6     1
 4:   4 0zOs6     1
 5:   1 3jtMi     2
 6:   2 3jtMi     2
 7:   3 3jtMi     2
 8:   1 5mJSn     3
 9:   2 5mJSn     3
10:   3 5mJSn     3
11:   1 dD8ro     4
12:   2 dD8ro     4

or rleid

> setDT(df)[, newid := rleid(uid)][]
    day   uid newid
 1:   1 0zOs6     1
 2:   2 0zOs6     1
 3:   3 0zOs6     1
 4:   4 0zOs6     1
 5:   1 3jtMi     2
 6:   2 3jtMi     2
 7:   3 3jtMi     2
 8:   1 5mJSn     3
 9:   2 5mJSn     3
10:   3 5mJSn     3
11:   1 dD8ro     4
12:   2 dD8ro     4
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81