Python Flask - Uploading, processing and sending files (csv, xlsx, pdf and word) - Best practices

Question

I was wondering if someone can guide me as to what the best practices would be in the following situation: I am writing a flask application and I would want the user to send me files on a bi-weekly or monthly basis, where files can be either one of csv, xlsx, pdf or word. These files have sensitive client specific data, so safety and security of these files is of utmost priority.

I understand that this is a pretty lengthy question, so I would like to apologize in advance for this. The following are just some of the online resources I have used to help answer my questions. Some of them were helpful, others I found to be extremely complicated. 1. https://viveksb007.github.io/2018/04/uploading-processing-downloading-files-in-flask 2. Writing a CSV from Flask framework 3. Create and download a CSV file from a Flask view 4. Upload CSV file using Python Flask and process it 5. Read file data without saving it in Flask

After going through multiple stack overflow responses, I have come to learn that there are essentially multiple ways of doing this: A. One would be to save the incommoding file from the user in folder within the same directory as where my application.py file resides. So like using the following syntax

UPLOAD_FOLDER = os.path.dirname(os.path.abspath(__file__)) + '/uploads/'

Once the file is in this 'uploads' folder, I will then be able to access it and process it using the following syntax if it lets say a csv file.

with open('UPLOAD_FOLDER/filename.csv') as csv_file:
    file = csv.reader(csv_file)

Once I am done with processing the file, I will be able to save the processed file in the UPLOAD_FOLDER with a new filename. I will be able to send back to the user the output of the file by using the following syntax

return send_from_directory('UPLOAD_FOLDER/processed_filename.csv', as_attachment=True)

Can someone please explain to me the following:

i) this process seems to work when I am locally hosting the application. However, what happens when I deploy this web application? What is UPLOAD_FOLDER = os.path.dirname(os.path.abspath(__file__)) + '/uploads/' going to return? Essentially, I am trying to understand what the absolute path here would be of the application? Is it going to result in the user uploading the file somewhere in the server of the company I have used to deploy my web application?

ii) Considering that I am working with sensitive client information, what specifically are the main security risks involved with this particular method?

iii) If I assume that the uploaded and processed files are going to be saved in the server of the company I am using to deploy my web application, does this mean that this is going to be a very expensive method of performing this task? That is, I would be taking on a lot of server space simply to save the files uploaded and subsequently processed by my application?

B. The other method I came across was essentially storing the contents of the uploaded file into a stream. I am not sure entirely sure what streams are in python (even though I have spent a lot of time reading up on the documentation). For this method, I will be creating an html form with an enctype="multipart/form-data" so that it can accept files. Once the user uploads the file, I will then store the contents of the file in a stream using the following syntax

file = request.files["file"]
stream = codecs.iterdecode(file.stream, 'utf-8')

I will eventually be able to read and process the contents of the stream using the following syntax:

with open(stream, "w") as csv_file:
    file = csv.reader(csv_file)

I will eventually send back to the client the processed file to the user using the following syntax: I am assuming that I have already created the processed file and it is called "processedfilename.csv"

return send_file(filename="resultssss.csv", as_attachment=True)

Can someone please explain:

i) I prefer this method because I don't have to save either the file that is uploaded by the user or the processed file anywhere physically. Is my understanding correct about this method (i.e. using this method, I am not going to be storing any files in either the clients filesystem or in a server?

ii) I was reading online in one of the stackoverflow responses that stream has a limit on how much it can read, and if the file size it too large then this method might not work. I believe the limit that was indicated on that response was 16KB or something. My client will definitely be sending files larger than this file size.

Python Flask - Uploading, processing and sending files (csv, xlsx, pdf and word) - Best practices

0 Answers0