0

Given a simple program for testing CGI with Apache server

#!C:/Python311/python.exe

html = """<!doctype html />
<html>
<head>
</head>
<body>
    <h1>Hello CGI World</h1>
</body>
</html>"""
print( "Content-Type: text/html" ) 
print( f"Content-Length: {len(html)}" )
print( "" )                         
print( html )                       

The problem in len(html) result less than actual. In editor (fig.1) we see 98 selected symbols. enter image description here

But in browser we see 91 symbol enter image description here

And response body cropped by it length enter image description here

I tried to display string symbol-by-symbol in Python console and found out that '\n' symbols comes alone while in editor and browser they are '\r\n' (my suggestion). In any case single-line string has no problem.

I tried to replace '\n' for '\r\n' (.replace('\n','\r\n')) but the problem not solves, browser shows extra 'CR' symbools and body still cropped.

Thanks forward for any ideas

DNS
  • 136
  • 4

2 Answers2

1

If I replace \n with \r\n I get exactly 98.

html = """<!doctype html />
<html>
<head>
</head>
<body>
    <h1>Hello CGI World</h1>
</body>
</html>"""
print("Content-Type: text/html")
html_length = len(html.replace('\n', '\r\n'))
print(f"Content-Length: {html_length}")
print("") 

Result:

Content-Type: text/html
Content-Length: 98

In Python interpreter:

Python 3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> html = """<!doctype html />
... <html>
... <head>
... </head>
... <body>
...     <h1>Hello CGI World</h1>
... </body>
... </html>"""
>>> print("Content-Type: text/html")
Content-Type: text/html
>>> html_length = len(html.replace('\n', '\r\n'))
>>> print(f"Content-Length: {html_length}")
Content-Length: 98
Cow
  • 2,543
  • 4
  • 13
  • 25
  • Thanks, it does work. But in this way we make a copy of `html` just for length computing. Seems to be more efficient algo – DNS Dec 16 '22 at 08:38
  • I also get OK in such way `extra = html.count('\n')` `print( f"Content-Length: {len(html) + extra}" )` But I don't like this solution. Python is a great lang, it should have pretty solution) – DNS Dec 16 '22 at 08:50
1

The Content-Length header is supposed to give the size of the message body in bytes. That is not the same as the length of the html string, because you're on Windows, and the \n characters get translated to Windows \r\n line breaks when you print them. Each line break becomes two characters.

Additionally, any characters that get encoded to more than 1 byte in the encoding specified by sys.stdout.encoding will also cause a length mismatch (and if sys.stdout.encoding is something weird, you might not be able to print some characters, or the browser might not understand what it's looking at).

You don't need to provide a Content-Length header in a CGI script - the web server will handle it for you. If you really want to compute Content-Length yourself, though, you can perform newline translation and encoding and check the length of the resulting bytestring:

import sys

temp = html
if sys.platform == 'win32':
    temp = temp.replace('\n', '\r\n')
temp = temp.encode(sys.stdout.encoding)
content_length = len(temp)

You can also explicitly set sys.stdout.encoding with sys.stdout.reconfigure, or change line break translation behavior:

# Sets sys.stdout.encoding to 'utf-8'
sys.stdout.reconfigure(encoding='utf-8')

# Disables \n -> \r\n translation
sys.stdout.reconfigure(newline='\n')

or write arbitrary bytes directly to sys.stdout.buffer if you want more control.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Thanks! But I suppose that there is algo without `replace` making a copy of string. In my example html a little string but real html is much more longer. In such case it is really better to exclude Content-Length header. But in other case(s) the problem may arise again... – DNS Dec 16 '22 at 09:09