Background - in an article editor powered by TinyMCE for an enterprise in-house CMS behind large media site/s
HTML
<p>non-breaking-space: pound: £ copyright: ©</p>
JS
console.log($('p').html());
console.log(document.getElementsByTagName('p').item(0).innerHTML);
both return
non-breaking-space: pound: £ copyright: ©
when I'm expecting
non-breaking-space: pound: £ copyright: ©
some elements get their entities reversed (like pound and copyright), and some are preserved (non-breaking space). I need a way to get the original inner HTML, all preserved, not one that is processed by the browser; is that possible?
This is for a TinyMCE plugin which processes input using jQuery and puts it back. The content is loaded via a database, the plugin is processing image tags did not want to modify the text content at all. The automatic change of some entities back to the raw characters wouldn't be too much of a problem, but -
- We cannot modify editorial's input, even if it were minor
- We enforce that these must be entities before they save due to some browser compatibility issues on our sites
I would use this answer - https://stackoverflow.com/a/4404544/830171 - however cannot as my HTML code is within a textarea that the user needs to edit and that I need to run jQuery DOM manipulation on (via the plugin).
One way I can think of is not use jQuery/DOM to process the image tags I need to change, but to use regex like a lot of TinyMCE plugins do; but since I was shot down in regex to pull all attributes out of all meta tags for attempting any regex on HTML, was hoping for a better way!