2

I have a df with a column that looks like this:

id   
11    
22
22
333
33
333

This column is sensitive data. I want to replace each value with any random number but each random number should be maintain the same number across the same IDs.

For example, I want to make mask the data in the column like so:

id   
123   
987
987
456
00
456

Note the same IDs have the same value. How do I achieve this? I have thousands of IDs.

mozway
  • 194,879
  • 13
  • 39
  • 75
RustyShackleford
  • 3,462
  • 9
  • 40
  • 81
  • You will have to extract the unique IDs and create a mapping dictionary. Either that, or using a hashing function. – Tim Roberts Aug 27 '22 at 23:50
  • A haahing function might not be safe if used without a secret salt. An attacker could still identify targeted IDs by computing the hash of the target and comparing the output. – mozway Aug 28 '22 at 01:41

2 Answers2

1

Here are two options to either generate a categorical (non random, id2), or a unique random per original ID (id3). In both case we can use pandas.factorize (or alternatively unique, or pandas.Categorical).

# enumerated categorical
df['id2'] = pd.factorize(df['id'])[0]

# random categorical
import numpy as np
s,ids = pd.factorize(df['id'])
d = dict(zip(ids, np.random.choice(range(1000), size=len(ids), replace=False)))
df['id3'] = df['id'].map(d)

# alternative 1
ids = df['id'].unique()
d = dict(zip(ids, np.random.choice(range(1000), size=len(ids), replace=False)))
df['id3'] = df['id'].map(d)

# alternative 2
df['id3'] = pd.Categorical(df['id'])
new_ids = np.random.choice(range(1000), size=len(df['id3'].cat.categories), replace=False)
df['id3'] = df['id3'].cat.rename_categories(new_ids)

Output:

    id  id2  id3
0   11    0  395
1   22    1  428
2   22    1  428
3  333    2  528
4   33    3  783
5  333    2  528
mozway
  • 194,879
  • 13
  • 39
  • 75
1

i would suggest something like this:

from random import randint

df['id_rand'] = df.groupby('id')['id'].transform(lambda x: randint(1,1000))
>>> df
'''
    id  id_rand
0   11      833
1   22      577
2   22      577
3  333      101
4   33      723
5  333      101
SergFSM
  • 1,419
  • 1
  • 4
  • 7