1

I am creating a tool where either

  1. A new XLSX file is generated for the user to download
  2. The user can upload an XLSX file they have, I will read the contents of that file, aand use them to generate a new file for the user to download.

I would like to make use of Pandas to read the XLSX file into a dataframe, so I can work with it easily. However, I can't get it working. Can you help me?

Example extract from CGI file:

import pandas as pd
import cgi
from mako.template import Template
from mako.lookup import TemplateLookup
import http.cookies as Cookie
import os
import tempfile
import shutil
import sys

cookie = Cookie.SimpleCookie(os.environ.get("HTTP_COOKIE"))

method = os.environ.get("REQUEST_METHOD", "GET")

templates = TemplateLookup(directories = ['templates'], output_encoding='utf-8')

if method == "GET": # This is for getting the page
    
    template = templates.get_template("my.html")
    sys.stdout.flush()
    sys.stdout.buffer.write(b"Content-Type: text/html\n\n")
    sys.stdout.buffer.write(
        template.render())

if method == "POST":

    form = cgi.FieldStorage()
    print("Content-Type: application/vnd.ms-excel")
    print("Content-Disposition: attachment; filename=NewFile.xlsx\n")
    
    output_path = "/tmp/" + next(tempfile._get_candidate_names()) + '.xlsx'
    
    data = *some pandas dataframe previously created*

    if "editfile" in form:
        myfilename = form['myfile'].filename
        with open(myfilename, 'wb') as f:
            f.write(form['myfile'].file.read())                
        data = pd.read_excel(myfilename)

    data.to_excel(output_path)

    with open(path, "rb") as f:
        sys.stdout.flush()
        shutil.copyfileobj(f, sys.stdout.buffer)

Example extract from HTML file:

<p>Press the button below to generate a new version of the xlsx file</p> 
<form method=post>
<p><input type=submit value='Generate new version of file' name='newfile'>
<div class="wrapper">
</div>
</form>
<br>
<p>Or upload a file.</p>
<p>In this case, a new file will be created using the contents of this file.</p>
<form method="post" enctype="multipart/form-data">
    <input id="fileupload" name="myfile" type="file" />
    <input value="Upload and create new file" name='editfile' type="submit" />
</form>

This works without the if "editfile" in form: bit so I know something is going wrong when I am trying to access the file that the user has uploaded.

The problem is that whilst a file is created, the created file has a file size of 0 KB and will not open in Excel. Crucially, the file that the user has uploaded can not be found in the location that I have written it out.

lhmarsden
  • 177
  • 11
  • For your updated question, when you say *something is going wrong*, could you describe what is actually happening and paste any errors you get. Is it an internal server error? If so, best to check the logs, e.g. in if your using apache2 it would be something like this: `tail -20 /var/log/apache2/error.log`. – costaparas Dec 12 '20 at 02:35
  • Thanks @costaparas, I have added a few sentences to the end of the question to clarify. – lhmarsden Dec 14 '20 at 14:55
  • Nothing is added to the error log when I run this. However, I get a 408 message in the access log - timeout I think. – lhmarsden Dec 15 '20 at 14:29
  • I can't fully reproduce this with the current code. Is this exactly what you're running? You may need to switch to XLS format only, since XLSX may not be supported depending on your version. See details [here](https://stackoverflow.com/questions/12705527/reading-excel-files-with-xlrd/65254855#65254855). – costaparas Dec 16 '20 at 06:57
  • 1
    I have enabled the 'cgitb' module ```import cgitb```, ```cgitb.enable``` at the beginning of the script. Now, the XLSX file output has a file size of 23 KB. When I open this in a text editor I can see an error log that includes a description of the error from Python - as you would see if you were running a .py file. This makes it clear that the error is to do with something unrelated which I can fix. I am happy to accept your answer which fixed this specific problem. Thanks for your help @costaparas. – lhmarsden Dec 16 '20 at 12:28

1 Answers1

2

You've passed myfilename to pandas; however that file doesn't exist on the server yet. You'll have to save the file somewhere locally first before using it.

The following will download the file to the current directory (same directory as the CGI script). Of course, you're welcome to save it to some more suitable directory, depending on your setup.

form = cgi.FieldStorage()
myfilename = form['myfile'].filename
with open(myfilename, 'wb') as f:  # Save the file locally
    f.write(form['myfile'].file.read())
data = pd.read_excel(myfilename)
costaparas
  • 5,047
  • 11
  • 16
  • 26
  • Thanks for this explanation. For some reason this didn't work for me. I cannot see the file in the same directory as my CGI script after running this. I'll extend my question. Maybe I missed something relevant in the original post. – lhmarsden Dec 11 '20 at 12:38