1

I'm not able to decrypt a csv file. I run the normal process to encrypt, then export to csv. Then I reload into and try to decrypt.

If I block out the csv import line, the script will properly decrypt as expected.

The error I receive says:

InvalidToken: occurred at index Name

I tried various iterations of encoding/decoding to no avail.

# import data
data = {'name': ["Joe", "Joe", "Joe","Jane","Jane"],
        'job': ["Analyst","Manager","Director","Analyst","Manager"],
        '#': [1,2,3,4,5],
        'yrs_serv': [1.1, 1.2, 1.3, 1.4, 1.5]}
df = pd.DataFrame(data, columns=['name', 'job', '#', 'yrs_serv'])

# generate key
encrypt_key = Fernet.generate_key()

f = Fernet(encrypt_key)
df = df_e.apply(lambda x: x.astype(str)) # preprocess
token = df.applymap(lambda x: f.encrypt(x.encode('utf-8')))

# the file goes out to a vendor, and they join some data and send back
# (I'll delete the new data concatenate it back into the df once the data is decrypted)
token.to_csv('encrypted_file.csv', index=False)

token = pd.read_csv('encrypted_file.csv') 

token = token.applymap(lambda x: x.encode('utf-8')) # seems the file import wasn't in utf-8

df_decrp = token.applymap(lambda x: f.decrypt(x))
June
  • 720
  • 10
  • 22

2 Answers2

3

For the example (corrected):

# import data
data = {'name': ["Joe", "Joe", "Joe","Jane","Jane"],
        'job': ["Analyst","Manager","Director","Analyst","Manager"],
        '#': [1,2,3,4,5],
        'yrs_serv': [1.1, 1.2, 1.3, 1.4, 1.5]}
df = pd.DataFrame(data, columns=['name', 'job', '#', 'yrs_serv'])

# generate key
encrypt_key = Fernet.generate_key()

f = Fernet(encrypt_key)
df_e = df.apply(lambda x: x.astype(str)) # preprocess
token = df_e.applymap(lambda x: f.encrypt(x.encode('utf-8')))
token.to_csv('encrypted_file.csv', index=False)

The decoding task is:

token2 = pd.read_csv('encrypted_file.csv') 
token3 = token2.applymap(lambda x: bytes(x[2:-1],'utf-8'))
token4 = token3.applymap(lambda x: f.decrypt(x))
df_decrp = token4.applymap(lambda x: x.decode('utf-8'))
df_decrp

The result is:

    name     job        #   yrs_serv
0   Joe     Analyst     1    1.1
1   Joe     Manager     2    1.2
2   Joe     Director    3    1.3
3   Jane    Analyst     4    1.4
4   Jane    Manager     5    1.5

where every element is a string. After you can convert strings to numbers

EXPLANATION: Let's take the element in the column=# and row=0

token['#'][0] = b'gAAAAAB......' (100 bytes)

when the bytes are written in the file csv token2['#'][0] ="b'gAAAAAB......'" (string = 103 characters)

if you use:

token3 = token2.applymap(lambda x: x.encode('utf-8'))

token3['#'][0] =b"b'gAAAAAB......'" (103 bytes!!)

For decrypting the data I have to have a DataFrame equal to token, but token3, in this case, is different from token!! You can't use it.

So before converting string to bytes you have to eliminate the first two characters "b and the last one "

token3 = token2.applymap(lambda x: x[2:-1])

token3['#'][0] =b'gAAAAAB......' (100 bytes)

Andrea Mannari
  • 982
  • 1
  • 6
  • 9
0

Andrea Mannari gave an excellent answer and mostly solved my problem, but I wanted a better explanation of why the slicing of the byte string was necessary so after some investigation, this is what I found.

Each of the elements in the DataFrame is stored as a byte string and if you print the DataFrame you clearly see that the elements are b'encoded text' denoting that they are byte strings so the following code should give the decrypted byte string but it doesn't.

f.decrypt(token2['name'][0])
"b'encrypted text'"

The problem is that when you use a pandas element this way pandas apparently applies the __str()__ function to it thereby converting the byte string into a byte string wrapped in a normal string.

token2['name'][0].__str__()  
"b'encrypted text'"

The solution to this problem is to force pandas to evaluate the byte string as a byte string by using the eval() function, so the following code will produce a byte string that we can use for decoding (conversion from byte string to normal string) and decryption (encrypting and decrypting byte strings).

eval(token2['name'][0])
b'encrypted text'

I also added encryption for the column headings. Below is my version of Andrea's code.

import pandas as pd
from cryptography.fernet import Fernet 

data = {'name': ["Joe", "Joe", "Joe","Jane","Jane"],
    'job': ["Analyst","Manager","Director","Analyst","Manager"],
    '#': [1,2,3,4,5],
    'yrs_serv': [1.1, 1.2, 1.3, 1.4, 1.5]}

df = pd.DataFrame(data, columns=['name', 'job', '#', 'yrs_serv'])
print(f'orig df \n{df}\n')

# generate key
# encrypt_key = Fernet.generate_key()
encrypt_key = b'gB07ncUSR2oFkeUGD8_gM_CcBQvfWwslrXg3QZOAqII='

# Use key to encrypt data
f = Fernet(encrypt_key)
df_e = df.apply(lambda x: x.astype(str)) # preprocess
token = df_e.applymap(lambda x: f.encrypt(x.encode('utf-8')))

# Encrypt column headings
token.columns = [f.encrypt(bytes(x,'utf-8')) for x in df_e.columns]

# Save to CSV file
token.to_csv('encrypted_file.csv', index=False)
print(f'encrypted df \n{token}\n')

#Read and decoding CSV:
token2 = pd.read_csv('encrypted_file.csv')

# Decrypt column headings first
token2.columns = [f.decrypt(eval(x).decode('utf-8')).decode('utf-8') for x in token2.columns]

# Decrypt the CSV data
df_decrp = token2.applymap(lambda x: f.decrypt(eval(x).decode('utf-8')).decode('utf-8'))
print(f'decrypted df \n{df_decrp}\n', )

Decrypting the CSV data involves the following steps:

Evaluating elements to byte strings

eval(x)

Decoding the byte string to a normal string

eval(x).decode('utf-8')

Decrypting the normal string resulting in a byte string

f.decrypt(eval(x).decode('utf-8'))

Decoding the decrypted byte string to a normal string

f.decrypt(eval(x).decode('utf-8')).decode('utf-8')

See this link for decoding and encoding byte strings Convert bytes to a string in python 3