7

Is it possible to extract the plaintext from a contenteditable div, including newlines? The jQuery $.text() method strips out newlines, which I need. The solution can use jQuery.

penguinrob
  • 1,431
  • 3
  • 17
  • 39
  • I don't know how to do this, but I wouldn't be surprised if using `
    ` or searching for answers that mention _pre-formatted text_ will help you find a solution.
    – Wex Jul 29 '11 at 02:57
  • Have you tried jquery $.html() ? – MAK Ripon Jul 29 '11 at 03:02
  • 1
    @Mak, that returns the formatted text. I don't want any html tags in it, but I do want newlines (which aren't in $.html) – penguinrob Jul 29 '11 at 03:08

3 Answers3

9

Without using extra plugins or writing your own implementation, you can just use both the innerText and textContent attributes (which are equivalent to each other). textContent is supported in all major browsers except IE 6-8, which supports innerText.

var text = x.innerText || x.textContent

http://www.quirksmode.org/dom/w3c_html.html#t07

EDIT: as AgentME points out, this won't preserve the user's whitespace in Firefox. To do that with contenteditable, you'd have to polyfill innerText on Firefox. There are a couple of options I could come up with:

  1. Start with x.innerHTML, strip out the extra markup whitespace/tags and convert the user's whitespace from <br> and &nbsp; to \n and spaces.
  2. Use Firefox's Selection and Range support. Firefox supports Selection.toString which has a similar behavior to innerText, but it only works on the currently-selected text. So what you can do is record the user's current selection, change the selection to be the contents of your contenteditable div, call Selection.toString, then restore the user's initial selection.

Out of the two I personally think the second option would be better in most cases; with the first option you'll either get a quick-and-dirty regex solution or you devolve into implementing a full-blown HTML parser with CSS layout logic. The downside to option 2 is that it's relatively slow, so you may run into issues if you trigger it onchange or something like that. Demo of option 2 is here.

More info:

mmitchell
  • 621
  • 5
  • 22
  • 8
    `innerText` and `textContent` are not completely equivalent. See http://stackoverflow.com/a/1359822/96100 – Tim Down Mar 11 '13 at 23:46
  • Well I guess that's one more reason to be glad I don't have to support old versions of IE! Still, even for those who do, I don't think it's ideal to add yet another library to do something this simple. For any nontrivial project that's a road to maintenance hell. If I were in OP's shoes, I would probably step back and consider whether this problem could be solved another way. It's possible that OP's problem would have been served by a well-styled textarea or some server-side processing. – mmitchell Mar 12 '13 at 00:56
  • This is the real answer to the OP's question. Not using a plugin, it solves the issue using native HTML functions/solutions. – Steve Jan 22 '14 at 15:57
  • 1
    .textContent does not include newlines. OP specifically asked for an answer that kept newlines. – Macil Apr 29 '15 at 18:58
  • You're right, I missed that one first time around. Thanks. – mmitchell May 02 '15 at 13:51
6

With a bit of tweaking, https://github.com/vorushin/jsHtmlToText was just what I needed.

Martin Delille
  • 11,360
  • 15
  • 65
  • 132
penguinrob
  • 1,431
  • 3
  • 17
  • 39
  • 3
    honestly, it doesn't deserve upvotes because the "solution" is a plugin. While it is nice that the question has been resolved with a satisfactory solution, it is not a true answer to the question, which is about how to get plain text out of a content-edited HTML element, using HTML/Javascript natively available to the browser. The OP didn't ask for a plugin, and honestly, "use a plugin" doesn't help anybody learn anything. The real answer is the one below this. – Steve Jan 22 '14 at 15:58
  • I disagree that "use this library which does exactly what you need" is a bad answer, but I will agree that it's a bad answer now that the link is a 404! – Macil Apr 29 '15 at 18:56
0

There's no easy, cross-browser way. innerText does what you want in some (but not all) browsers. Setting the selection to encompass the editable element and calling toString() does what you want in a different collection of browsers. In short, there's no easy way: you need to traverse the DOM and add line breaks in as appropriate for <br> and block-level elements. I certainly wouldn't recommend using any regex-based solution, such as the one you seem to have settled on, because it can never work for all possible HTML.

Self-promotion: I will be adding this to my Rangy library in the reasonably near future.

Tim Down
  • 318,141
  • 75
  • 454
  • 536
  • Has this been added to Rangy? I don't see it, but I would love if it were there. – Bridger Maxwell Mar 16 '12 at 21:17
  • @Bridgeyman: Not yet, but after a few months of putting it off I'm now actively working on it and intend to release something in the next month or so. – Tim Down Mar 19 '12 at 00:06