0

I have the following dataframe

              id      timestamp
0              0  1616152948734
1              0  1616152958727
2              0  1616152968727
3              0  1616152978737
4              0  1616152997360
...            ...
39             0  1616153347348
40             0  1616153357350
41             0  1616153360638
42             1  1618523825696
43             1  1618523831257
44             2  1618566194435
45             2  1618566206091
46             2  1618566216078
47             2  1618566226080
48             3  1618566343202
49             3  1618566346287

But to anonymize the timestamp, my goal is to turn timestamp into a count according to the id

              id  timestamp
0              0  1
1              0  2
2              0  3
3              0  4
4              0  5
...            ...
39             0  40
40             0  41
41             0  42
42             1  1
43             1  2
44             2  3
45             2  4
46             2  5
47             2  6
48             3  1
49             3  2

I'm looking for similar questions and answers. The closest ones that I could find are factorize-a-column-of-strings-in-pandas and change-values-in-pandas-dataframe-according-to-value-counts but doesn't quite know how to solve my problem.

XueXu
  • 69
  • 2
  • 8

1 Answers1

1

Just a .cumcount() should work. If you want to start with 1 then just add 1 to result

df = pd.DataFrame({'id': [0,0,0,0,1,1,1,1,1,1,2,2,2,2,2]})
df['timestamp'] = df.groupby(['id']).cumcount()


    id  timestamp
0   0   0
1   0   1
2   0   2
3   0   3
4   1   0
5   1   1
6   1   2
7   1   3
8   1   4
9   1   5
10  2   0
11  2   1
12  2   2
13  2   3
14  2   4
Shubham Periwal
  • 2,198
  • 2
  • 8
  • 26