0
my dataframe, data, looks like this


ID        Query                                 email                    phone
1         hi                                    []                       []
2         email is johnsmith@gmail.com          [johnsmith@gmail.com]    []
3         phone no is 12345678790               []                       [12345678790]

I want to create a column called masked query which will look like this

ID        Query                                 email                    phone               masked_query
1         hi                                    []                       []                  [hi]
2         email is johnsmith@gmail.com          [johnsmith@gmail.com]    []                  [email is XXXXXXXXXXXXXXXXXXX]
3         phone no is 12345678790               []                       [12345678790]       [phone no is XXXXXXXXXX]

The columns, email and phone, I have created using regex functions,I need to create a function to create this column 'masked_query' and mask the data but I don't know how to proceed with masking the data. Any help is appreciated.

1 Answers1

0
from math import ceil

def masking(string, perc=0.6):
    chars = ceil(len(string) * perc)
    return f'{"X" * chars}{string[chars:]}'

df['masked_query'] = df.Query.apply(masking)

Option 2

df['masked_query'] = df.Query.apply(lambda x:re.sub('\w+@\w+.com|\d+',len(re.findall('\w+@\w+.com|\d+',s)[0]) * 'X', x)
               if re.findall('\w+@\w+.com|\d+',x) else x)

Not sure if this is what you are looking for.

  • Hi, thanks for your answer, just wanted to understand, why arent you referencing the 2 keyword columns phone and email? The output I am looking for is different, can you help me with the exact format of output? – Abishay Mathew Jul 03 '20 at 07:14