0

I'm trying to parse some poorly formatted XML.

I say poorly formatted - because everyone knows that you're not supposed to have un-escaped ampersands in an XML file.

Problem is, I need to collect some unicode formatted phrases from an XML file. I need the format to be as close to the original as possible. You can replicate this issue in your console log...

console.log($("<test>&#xE2;</test>").text())
// Outputs 'â' instead of desired '&#xE2;'

I've tried every combination of escape, unescape(), encodeURI(), decodeURI() I can fathom.

I've tried both settings for jQuery's ajax({processData: bool}) flag. All answers I've found point to these solutions - and it seems like none of them work...

How can I modify the above code to output the original XML content?

1owk3y
  • 1,115
  • 1
  • 15
  • 30
  • Possible duplicate of [Unescape HTML entities in Javascript?](https://stackoverflow.com/questions/1912501/unescape-html-entities-in-javascript) – Joshua K Sep 25 '17 at 01:27
  • It isn't. This isn't a generic escape/unescape question - it's specific to unicode. If you're gonna flag, at least READ the question. Thanks. – 1owk3y Sep 25 '17 at 01:32
  • I READ the question. Just missunderstood it. Next time try to explain your problem in a propper way with some xml sample code and javascript code that explains how you try to read out the phrases. Your title is missleading: You don't want to `parse` it. You want to retrieve it from the xml without the automatic exchange with the unicode character. That's something different. – Joshua K Sep 25 '17 at 01:50
  • in your sample it would work to replace all `&` with `&`. You don't need the overhead of creating HTMLNodes: `console.log($(''+'â'.replace('&', '&')+'').text())` [Here is a little fiddle](https://jsfiddle.net/Kasalop/wyz9yfko/) – Joshua K Sep 25 '17 at 01:52

1 Answers1

1

Use new Option(yourUnescapedXml).innerHTML. So to answer your question directly,

console.log($(`<test>${new Option('&#xE2;').innerHTML}</test>`).text())

This creates an HTMLOptionElement, then immediately gets its (escaped) innerHtml.

Will
  • 1,171
  • 2
  • 14
  • 26
  • No idea how that works - but I can't argue with the results! Thanks man! (edit: Need to wait 6 more minutes before accepting answer - but I'm choosing this if anyone else posts. It's the most direct answer) – 1owk3y Sep 25 '17 at 01:04
  • I've updated the answer to explain how it works. If you want to accept this answer, please mark it as such. – Will Sep 25 '17 at 01:07
  • Done. Thanks again - and for the additional info. Would have marked it sooner but there is a 10 minute cooldown from asking a question and selecting an answer. – 1owk3y Sep 25 '17 at 01:11