1

I am running a Python 2.7 script as CGI on Apache 2.4 on Win 10, the sctipt sends a ZIP archive as download in HTTP response. I followed this thread How to deploy zip files (or other binaries) trough cgi in Python?, but keep getting a broken ZIP file. I hope someone can help as I've been trying to resolve this for 2 days, cannot find any info on this behavior.

Demo Script:

import cgi, cgitb, os
import shutil

cgitb.enable()

out_path = os.path.dirname(__file__) + "\\tmp_uploads\\test2.zip"  
            
# send output zip as download
import sys
print "Content-Disposition: attachment; filename=\"test2.zip\""
print "Content-Type: application/zip"
print

##sys.stdout.flush()

with open(out_path,'rb') as zf:
    shutil.copyfileobj(zf, sys.stdout)
##    print zf.read()

Enabling sys.stdout.flush() or using print zf.read() instead of shutil.copyfileobj(zf, sys.stdout) makes no difference.

Original ZIP file is intact:

enter image description here

enter image description here

Downloaded archive is broken:

enter image description here

enter image description here

enter image description here

B-and-P
  • 1,693
  • 10
  • 26

2 Answers2

1

I have the same problem, Python 2.7.18 script as CGI on a lighttpd webserver on Win10. I compared the downloaded zip file with the original and found the problem. Python automatically converts all \n to \r\n in stdout. The only way to Prevent Python print()'s automatic newline conversion to CRLF on Windows I found is to use sys.stdout.buffer, which is not available in Python 2.7.

Update: Turning off the buffering is the key, as answered by Dalen. I found another way to turn it off if you don't want to set a system specific shebang:

msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
oXYo
  • 11
  • 3
  • 1
    It's not the only way. See my answer. In Python 2, the Windows is creating a problem, it considers STDOUT to be text, not binary. Apache's mod_cgi does enforce binary if the PIPE is unbuffered (or Windows stops being a PITA), why I do not have any ideas, but it works. Python 3 enforces the binary mode to the STDOUT but overlays it with unicode text enabled file-like object. To go around it, you write to sys.stdout.buffer as you said. So in Py 3 the problem is actually fixed. – Dalen Jun 29 '21 at 00:40
  • It works, thanks. I cannot set a shebang but I found another way to turn off the buffering. See my update. – oXYo Jun 30 '21 at 15:30
  • Excellent! Not that it is important, but does this break print statement? It shouldn't, but Windows are odd. I would use sys.stdout.write() for headers too anyway, just in case. – Dalen Jul 01 '21 at 17:43
  • I use print for the headers and sys.stdout.write() for the content. I had no problem with my tests so far. I wrapped msvcrt.setmode and the import in a try-except, so the same code can run on unix and windows. Not sure if I should set the mode to binary on unix as well, but it works so far. – oXYo Jul 01 '21 at 20:22
  • You don't have too. Unixoides have fcntl and ioctl syscall to manipulate file descriptors, but any PIPE is by default opened as binary. I never had any problems on Linux or Mac with IPC using PIPEs nor with Apache or Nginx forking and executing CGI scripts written in any language. Anyway, you know that even file-access libs on *nixes don't have native text mode. In Py2 there is no difference between "r" and "rb" when using built-in open() on *nix machine. In Python3 text mode is simulated, both for *nix and Windows, thus everything works smoothly everywhere. – Dalen Jul 02 '21 at 18:14
0

First of all, if you want Windows to transfer any longer, especially binary, content using PIPEs, in this case, STDOUT, when layered over more PIPEs (Apache CGI), then you have to turn of buffering completely, and take control over it yourself. So your script's execution line should look like:

#!C:\Python27\python.exe -u

Secondly, you have to use flush() and give the content in portions, i.e. buffer the thing, otherwise, if the file you are sending is big either HTTP server will consider your script as frozen and kill it, sending the timeout or internal server error response to client, or the client will terminate the connection while waiting too long for the response. So the STDOUT PIPE has to be lively.

data = zf.read(8192)
while data:
    sys.stdout.write(data)
    sys.stdout.flush()
    data = zf.read(8192)

And lastly. Do declare the Content-Transfer-Encoding header to be "binary". It helps. Also, it is advisable to provide Content-Length header as the output of the script is a continuous stream and Apache wouldn't have any idea how much bytes will it contain untill the end of the script, at which point it is too late as all headers are already sent.

Dalen
  • 4,128
  • 1
  • 17
  • 35