1

I am parsing this XML data, created by/for another application:

<Data>
  <Description>Hello\nWorld<Description>
</Data>

If I find the string, it converts the \n to \\n, so \n appears in the output instead of the intended newline:

import xml.etree.ElementTree as ET
...
root = ET.parse(xmldata).getroot()
print(root.find('Description').text)

Output is:

Hello\nWorld

My immediate workaround is to unescape the backslash:

print(root.find('Description').text.replace('\\n','\n'))

Output is:

Hello
World

But is there a more correct way? I have the sense there is a more correct way to do this.

Jack Deeth
  • 3,062
  • 3
  • 24
  • 39
  • XML will keep all whitespace inside text, so just put an actual line break in your XML. – Thomas Feb 08 '21 at 13:30
  • Thanks @Thomas but in this instance I don't own the XML data, I'm just parsing it. – Jack Deeth Feb 08 '21 at 13:38
  • In that case the creator of the data should have provided some documentation on how the data is encoded/escaped. If you're OK with just guessing, see https://stackoverflow.com/questions/1885181/how-to-un-escape-a-backslash-escaped-string – Thomas Feb 08 '21 at 13:39
  • Thanks again. Have clarified further - the user enters text in a GUI in an application, which saves the user's work in an XML file with the text formatted as shown. There's no documentation and I'm completely guessing. I've never worked with XML data before today and didn't realise it's unusual to use the `\n` sequence instead of the newline character in XML! – Jack Deeth Feb 08 '21 at 14:06

0 Answers0