0

I want to change values from one column in a dataframe to fake data.

Here is the original table looking sample:

df = {'Name':['David', 'David', 'David', 'Kevin', 'Kevin', 'Ann', 'Joan']
'Age':[10,10,10,12,12,15,13]}
df = pd.DataFrame(df)
df

Now what I want to do is to change the Name column values to fake values like this:

df = {'Name':[A, A, A, B, B, C, D]
    'Age':[10,10,10,12,12,15,13]}
    df = pd.DataFrame(df)
    df

Notice how I changed the names to a distinct combination of Alphabets. this is sample data, but in real data, there are a lot of names, so I start with A,B,C,D then when it reaches Z, the next new name should be AA then AB follows, etc..

Is this viable?

halfer
  • 19,824
  • 17
  • 99
  • 186
Yun Tae Hwang
  • 1,249
  • 3
  • 18
  • 30

5 Answers5

1

Here is my suggestion. List 'fake' below has more than 23000 items, if your df has more unique values, just increase the end of the loop (currently 5) and the fake list will increase exponentially:

import string
from itertools import combinations_with_replacement

names=df['Name'].unique()

letters=list(string.ascii_uppercase)

fake=[]

for i in range(1,5): #increase 5 if you need more items
    fake.extend([i for i in combinations_with_replacement(letters,i)])

fake=[''.join(i) for i in fake]

d=dict(zip(names, fake))

df['code']=df.Name.map(d)

Sample of fake:

>>> print(fake[:30])
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'AB', 'AC', 'AD']

Output:

>>>print(df)

    Name  Age code
0  David   10    A
1  David   10    A
2  David   10    A
3  Kevin   12    B
4  Kevin   12    B
5    Ann   15    C
6   Joan   13    D
Ravi
  • 2,778
  • 2
  • 20
  • 32
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30
0

Use factorize and make the Fake name as int which is easy to store

df['Fake']=df.Name.factorize()[0]
df
    Name  Age  Fake
0  David   10     0
1  David   10     0
2  David   10     0
3  Kevin   12     1
4  Kevin   12     1
5    Ann   15     2
6   Joan   13     3

If need mix type

df.groupby('Name')['Name'].transform(lambda x : pd.util.testing.rands_array(8,1)[0])
0    jNAO9AdJ
1    jNAO9AdJ
2    jNAO9AdJ
3    es0p4Yjx
4    es0p4Yjx
5    x54NNbdF
6    hTMKxoXW
Name: Name, dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
0
from string import ascii_lowercase
def excel_names(num_cols):
    letters = list(ascii_lowercase)
    excel_cols = []
    for i in range(0, num_cols - 1):
        n = i//26
        m = n//26
        i-=n*26
        n-=m*26
        col = letters[m-1]+letters[n-1]+letters[i] if m>0 else letters[n1]+letters[i] if n>0 else letters[i]
        excel_cols.append(col)
    return excel_cols


unique_names=df['Name'].nunique()+1
names=excel_names(unique_names)
dictionary=dict(zip(df['Name'].unique(),names))
df['new_Name']=df['Name'].map(dictionary)
Ravi
  • 2,778
  • 2
  • 20
  • 32
  • excel_names reference from https://stackoverflow.com/questions/56452581/continous-alphabetic-list-in-python-and-getting-every-value-of-it – Ravi Dec 16 '20 at 20:01
0

Get new integer category of names using cumsum and use Python ord,char TO turn the integer argument into strings starting from A

 df['Name']=(~(df.Name.shift(1)==df.Name)).cumsum().add(ord('A') - 1).map(chr)
print(df)



   Name  Age
0    A   10
1    A   10
2    A   10
3    B   12
4    B   12
5    C   15
6    D   13
wwnde
  • 26,119
  • 6
  • 18
  • 32
0

let us think in another way. If you nead a fake sympol, so let us maping them to A0,A1,A2 to An. this would be more easy.

df = {'Name': ['David', 'David', 'David', 'Kevin', 'Kevin', 'Ann', 'Joan'], 'Age': [10, 10, 10, 12, 12, 15, 13]}
df = pd.DataFrame(df)
map = pd.DataFrame({'name': df['Name'].unique()})
map['seq'] = map.index
map['symbol'] = map['seq'].apply(lambda x: 'A' + str(x))
df['code'] = df['Name'].apply(lambda x: map.loc[map['name']==x]['symbol'].values)
df

    Name  Age code
0  David   10   A0
1  David   10   A0
2  David   10   A0
3  Kevin   12   A1
4  Kevin   12   A1
5    Ann   15   A2
6   Joan   13   A3
Nour-Allah Hussein
  • 1,439
  • 1
  • 8
  • 17