0

I would like to pseudonymize my data frame. I have a name and a student ID. I would like to pseudonymize the name so that it is not recognizable. I found the gocept.pseudonymize library. The library I found does just that. Only I would like to enter the pseudonymize value and then I should receive my string.

  Student  Studendid          Student    Studendid     Student  Studendid   
0   Stud1    1              0   ah274as   1        0   Stud1    1
1   Stud2    2              1   ah474as   2        1   Stud2    2  
2   Stud3    3              2   ah454as   3        2   Stud3    3  
3   Stud4    4              3   48sdfds   4        3   Stud4    4  
4   Stud5    5       ->     4   dash241   5    ->  4   Stud5    5 
5   Stud6    6              5   asda212   6        5   Stud6    6
6   Stud7    7              6   askdkj2   7        6   Stud7    7  
7   Stud8    8              7   kadhh23   8        7   Stud8    8  
8   Stud9    9              8   asdhb27   9        8   Stud9    9 

Do you know a library that can do that? Or a method?

import gocept.pseudonymize
gocept.pseudonymize.text('Here is my little text', 'secret')
[OUT] 'u7YJWz RqdYkfNUFgZii2Y'

# What I want
gocept.pseudonymize.getString('u7YJWz RqdYkfNUFgZii2Y')
[OUT] 'Here is my little text'
  • 2
    Does this do what you want? https://stackoverflow.com/questions/52116171/how-to-encrypt-and-decrypt-pandas-dataframe-with-decryption-key – Mandera Oct 30 '20 at 09:26
  • 2
    You could save it in a dictionary or use a have a look at the answers to [this](https://stackoverflow.com/q/2490334/9568847) question. – Niklas Mertsch Oct 30 '20 at 09:27

1 Answers1

0

Here is a example of a simple string encoding :

sentence = 'Here is my little text'
encoded = ""
decoded = ""
shift   = 8 # change this to get different results

for c in sentence:
  i = ord(c)
  if (i >= 48 and i < 58): 
    continue
  elif (i >= 65 and i < 91): 
    i = ((i - 65 + shift) % 26) + 65
  elif (i >= 97 and i < 123): 
    i = ((i - 97 + shift) % 26) + 97
  encoded += chr(i)

print(encoded) # => Pmzm qa ug tqbbtm bmfb

for c in encoded:
  i = ord(c)
  if (i >= 48 and i < 58): 
    continue
  elif (i >= 65 and i < 91): 
    i = ((i - 65 - shift) % 26) + 65
  elif (i >= 97 and i < 123): 
    i = ((i - 97 - shift) % 26) + 97
  
  decoded += chr(i)

print(decoded) # => Here is my little text
dspr
  • 2,383
  • 2
  • 15
  • 19