0

Using the following code snippet I am not sure how to resolve the bad element/attribute. It seems to be quoted correctly and is in proper utf-8 form (I believe). But the \x07 is tripping up the xmltodict.parse

Exception: not well-formed (invalid token): line 6, column 15

Any ideas how to strip these codepoints so it doesn't throw exceptions?

response = requests.get(dp_url, params=dp_params)

try:
    dict_response = xmltodict.parse(response.text)
except Exception as e:   ***not well-formed (invalid token): line 6, column 15***
    print(e)

The XML:

<result><record><field name='donor_id' id='donor_id' value='40362'/><field name='first_name' id='first_name' value='John'/><field name='org_rec' id='org_rec' value='N'/><field name='donor_type' id='donor_type' value='IN'/><field name='nomail' id='nomail' value='N'/><field name='nomail_reason' id='nomail_reason' value=''/><field name='narrative' id='narrative' value='2/26/2021 - TD: added Louise to record. Check only has her name and didn&apos;t return the reply device.\r\n3/17/2015 - MS: Removed an extra sopace between Spring and St in Address field. \r\n\r\n8/26/2014 - MS: Moved initial to Middle Name field.\r\n\r\n11/14/2012 TD: \x07 telephone number added per telephone campaign 2012'/><field name='tag_date' id='tag_date' value=''/><field name='quickbooks_customer_id' id='quickbooks_customer_id' value=''/></record></result>
  • 1
    Where is "the `\x07`"? Please read [mre] and try to create a shorter example XML that demosntrates the problem. If it doesn't have to be all on the same line to cause the problem, then please line-wrap it sensibly, so that we can easily see the contents of a problematic input. – Karl Knechtel Jun 02 '22 at 05:31
  • Also: are you sure it will be fine to treat the XML as if the problematic code point weren't there? – Karl Knechtel Jun 02 '22 at 05:32
  • the \x07 is in "The XML" section I shared above. Just have to scroll over to the right a bit. So, yes I can remove the offending codepoint but would like to know how to properly/best handle things like this. – Sm00thSailn Jun 02 '22 at 11:59

1 Answers1

1

Have you tried just replacing the problematic character?

xmltodict.parse(response.text.replace('\x07', ''))
Snackerino
  • 129
  • 9