6

Background - in an article editor powered by TinyMCE for an enterprise in-house CMS behind large media site/s

HTML

<p>non-breaking-space: &nbsp; pound: &pound; copyright: &copy;</p>

JS

console.log($('p').html());
console.log(document.getElementsByTagName('p').item(0).innerHTML);

both return

non-breaking-space: &nbsp; pound: £ copyright: ©

when I'm expecting

non-breaking-space: &nbsp; pound: &pound; copyright: &copy;

some elements get their entities reversed (like pound and copyright), and some are preserved (non-breaking space). I need a way to get the original inner HTML, all preserved, not one that is processed by the browser; is that possible?

This is for a TinyMCE plugin which processes input using jQuery and puts it back. The content is loaded via a database, the plugin is processing image tags did not want to modify the text content at all. The automatic change of some entities back to the raw characters wouldn't be too much of a problem, but -

  • We cannot modify editorial's input, even if it were minor
  • We enforce that these must be entities before they save due to some browser compatibility issues on our sites

I would use this answer - https://stackoverflow.com/a/4404544/830171 - however cannot as my HTML code is within a textarea that the user needs to edit and that I need to run jQuery DOM manipulation on (via the plugin).

One way I can think of is not use jQuery/DOM to process the image tags I need to change, but to use regex like a lot of TinyMCE plugins do; but since I was shot down in regex to pull all attributes out of all meta tags for attempting any regex on HTML, was hoping for a better way!

Community
  • 1
  • 1
gingerCodeNinja
  • 1,239
  • 1
  • 12
  • 27
  • 1
    A `console.dir` of an element with such text doesn't show any properties with the entities preserved. Even the debugger (in Chrome) shows all elements' HTML without entities preserved, so I guess you're out of luck. – pimvdb Jan 16 '13 at 19:18

1 Answers1

1

Tinymce uses a contenteditable iframe to edit the content. That's the reason why console.log($('p').html()); will log something else.

Use the following code to get the pure editor content:

tinymce.get('your_editor_id').getBody().innerHTML
Gogol
  • 3,033
  • 4
  • 28
  • 57
Thariama
  • 50,002
  • 13
  • 138
  • 166
  • I wouldn't focus too much on the TinyMCE part of the question, but this in general how to get back the original HTML, here shows the same problem specific to the TinyMCE plugin - `ed.onPostProcess.add( function(ed, o) { console.log(o.content); // outputs £ console.log($('' + o.content + '').html()); // outputs £` – gingerCodeNinja Jan 16 '13 at 17:19