0

User is entering html code in a textarea. Example:

<td class="cell-content">
      Contact Resistance Phase A
</td>

I'm trying to save this input on my server in a file like this:

html = request.POST['htmltext']
f = open("tracker/templates/jobs/forms/new_file.html", "w")
f.write(html)
f.close()

Python appears to be adding lines when writing the code. The saved file looks like:


<td class="cell-content">

    Contact Resistance Phase A

</td>

Why is this happening and how might one rectify this?

albertrw
  • 65
  • 1
  • 8
  • Can you do `print(repr(html))` to echo out the contents of the string so we know what you're retrieving? Also, please specify Python version, and OS; my suspicion is a bug in universal line ending translation, and that would be very sensitive to the raw data and the OS line ending rules. Also, what editor are you loading the file in that shows the extra newlines? – ShadowRanger Apr 16 '23 at 23:59

2 Answers2

0

Couldn't figure out the reason for the behavior, but found a solution that works. I just encode the html like so:

html = request.POST['htmltext']
f = open("tracker/templates/jobs/forms/new_file.html", "wb")
f.write(html.encode('utf-8'))
f.close()

Found the answer here: Can I encode a string, save it to a file, read it back, and decode it using Python 3?

albertrw
  • 65
  • 1
  • 8
-1

Edit

Hopefully this can clear up any confusion on why this solution works.

html = request.POST['htmltext']
f = open("tracker/templates/jobs/forms/new_file.html", "wb")
f.write(html.encode('utf-8'))
f.close()

When writing to a file using write(), python uses text mode ("w"). Writing in binary mode ("wb") means that the data will be written in its raw binary form.

By using the encoding standard UTF-8, you are specifying to convert the string into bytes in the UTF-8 encoding format. This is necessary because it will ensure that non-ASCII characters, such as the newline characters already in your string, are encoded properly.

Previous Answer

When using pythons write() method, it is writing exactly what is provided into the file. So if you attempt to write() a string which contains a newline character, python will see that it is at the end of a line, and start a newline. This is why you are essentially seeing 2 newlines.

To fix this, strip() the string to remove extra newline characters from the output file.

f.write(html.strip())

  • The code does not contain newline characters. I tried the strip method but it doesn't work. Same result. – albertrw Apr 16 '23 at 23:01
  • "This means that python will add a newline character at the end of each line that is written to the file, hence the extra lines." That is not at all what happens. The lines already end with newlines, Python just writes them. The only possible issue I can see would be in a weird situation, the universal newline translation might convert `\r\n` to a *pair* of newlines (treating the first `\r` as one newline, separate from the `\n` which follows), but I'm not sure how you'd actually trigger this (universal line-ending translation *should* recognize them as a pair). – ShadowRanger Apr 16 '23 at 23:57
  • I tend to agree with ShadowRanger that this answer may be incorrect. Do you have any documentation that supports that python's "write()" function adds newline characters? However, I've also printed out the content before writing it and there is no newline characters in the code. – albertrw Apr 17 '23 at 01:10
  • My mistake, ShadowRanger is correct, write() does not add any newline characters before writing. I've updated my answer to remove that statement. Could you explain the output you got resulting in no newline characters? When printing the html albertrw provided using repr() function, there are \n characters, however no '\r' characters as ShadowRanger suggests. – Grewal_Creator Apr 17 '23 at 04:14