2

So I am planning to do some data analysis using a Jupyter notbook (Python 3), for collaborative reasons I want to store the data on a github repo, however the data set is sensitive.

As such I would like to store the data (currently .csv) as an encrypted file on the repo and then decrypt it at runtime (with a password prompt I guess).

What is the best method to do this?

Harvs
  • 503
  • 1
  • 6
  • 18
  • Possible duplicate of [How to AES encrypt/decrypt files using Python/PyCrypto in an OpenSSL-compatible way?](https://stackoverflow.com/questions/16761458/how-to-aes-encrypt-decrypt-files-using-python-pycrypto-in-an-openssl-compatible) – Gustavo Magalhães Jul 29 '17 at 00:36

1 Answers1

7

In the end, I used python 3.6 and SimpleCrypt to encrypt the file and then uploaded it.

I think this is the code I used to encrypt the file:

f = open('file.csv','r').read()
ciphertext = encrypt('USERPASSWORD',f.encode('utf8')) #this .encode('utf8') is the bit im unsure about
e = open('file.enc','wb') # file.enc doesn't need to exist, python will create it
e.write(ciphertext)
e.close

This is the code I use to decrypt at runtime, I run getpass("password: ") as an argument so I don't have to store a password variable in memory

from io import StringIO
import pandas as pd
from simplecrypt import encrypt, decrypt
from getpass import getpass

# opens the file
f = open('file.enc','rb').read()

print('Please enter the password and press the enter key \n Decryption may take some time')

# Decrypts the data, requires a user-input password
CSVplaintext = decrypt(getpass("password: "), f).decode('utf8')
print('Data have been Decrypted')

#create a temp csv-like file to pass to pandas.read_csv()
DATA=StringIO(CSVplaintext)

# Makes a panda dataframe with the data
df = pd.read_csv(DATA)

Note, the UTF-8 encoding behaviour is different in python 2.7 so the code will be slightly different.

Harvs
  • 503
  • 1
  • 6
  • 18