How can I truncate the text contents of an Element while preserving HTML?

Question

I realize that there are several similar questions here but none of the answers solve my case.

I need to be able to take the innerHTML of an element and truncate it to a given character length with the text contents of any inner HTML element taken into account and all HTML tags preserved.

I have found several answers that cover this portion of the question fine as well as several plugins which all do exactly this.

However, in all cases the solution will truncate directly in the middle of any inner elements and then close the tag.

In my case I need the contents of all inner tags to remain intact, essentially allowing any "would be" truncated inner tags to exceed the given character limit.

Any help would be greatly appreciated.

EDIT:

For example:

This is an example <a href="link">of a link</a> inside another element

The above is 51 characters long including spaces. If I wanted to truncate this to 23 characters, we would have to shorten the text inside the </a> tag. Which is exactly what most solutions out there do.

This would give me the following:

This is an example <a href="link">of a</a>

However, for my use case I need to keep any remaining visible tags completely intact and not truncated in any way.

So given the above example, the final output I would like, when attempting to truncate to 23 characters is the following:

This is an example <a href="link">of a link</a>

So essentially we are checking where the truncation takes place. If it is outside of an element we can split the HTML string to exactly that length. If on the other hand it is inside an element, we move to the closing tag of that element, repeating for any parent elements until we get back to the root string and split it there instead.

*"I realize that there are several similar questions here"* Which ones did you look at and find wanting? Links are usually helpful. *"...but none of the answers solve my case."* In what way, specifically? — T.J. Crowder, Sep 03 '15 at 17:18
I'm curious: It's really a length limit *in characters*? Most HTML pages are presented in variable-width fonts, often with kerning. `i` and `M` are dramatically different widths. You're really doing this by character count? (I'm sure there are use cases, just checking.) — T.J. Crowder, Sep 03 '15 at 17:22
I have to say I'm not finding any good question that addresses this. [There's this one](http://stackoverflow.com/questions/17458410/whats-the-quickest-way-to-truncate-paragraph-text-that-may-or-may-not-include-h), but its only upvoted answer (which does make it a valid "close as duplicate of this" target) is unsatisfying. — T.J. Crowder, Sep 03 '15 at 17:26
@T.J.Crowder Yes that is correct. I know it's quite an unusual use case. — Gordo, Sep 04 '15 at 06:03

score 1 · Answer 1 · edited May 23 '17 at 12:00

You've tagged your question regex, but you cannot reliably do this with regular expressions. Obligatory link. So innerHTML is out.

If you're really talking characters, I don't see a way to do it other than to loop through the nodes within the element, recursing into descendant elements, totalling up the lengths of the text nodes you find as you go. When you find the point where you need to truncate, you truncate that text node and then remove all following ones — or probably better, you split that text node into two parts (using splitText) and move the second half into a display: none span (using insertBefore), and then move all subsequent text nodes into display: none spans. (This makes it much easier to undo it.)

I think you're correct. The more I've thought about the problem, the more I have come to the same conclusion. — Gordo, Sep 04 '15 at 05:51

Anthony Blackshaw · Accepted Answer · 2015-09-04T08:24:58.860

It sounds like you'd like to be able to truncate the length of your HTML string as a text string, for example consider the following HTML:

'<b>foo</b> bar'

In this case the HTML is 14 characters in length and the text is 7. You would like to be able to truncate it to X text characters (for example 2) so that the new HTML is now:

'<b>fo</b>'

Disclosure: My answer uses a library I developed.

You could use the HTMLString library - Docs : GitHub.

The library makes this task pretty simple. To truncate the HTML as we've outlined above (e.g to 2 text characters) using HTMLString you'd use the following code:

var myString = new HTMLString.String('<b>foo</b> bar');
var truncatedString = myString.slice(0, 2);
console.log(truncatedString.html());

EDIT: After additional information from the OP.

The following truncate function truncates to the last full tag and caters for nested tags.

function truncate(str, len) {
    // Convert the string to a HTMLString
    var htmlStr = new HTMLString.String(str);

    // Check the string needs truncating
    if (htmlStr.length() <= len) {
        return str;
    }

    // Find the closing tag for the character we are truncating to
    var tags = htmlStr.characters[len - 1].tags();
    var closingTag = tags[tags.length - 1];

    // Find the last character to contain this tag
    for (var index = len; index < htmlStr.length(); index++) {
        if (!htmlStr.characters[index].hasTags(closingTag)) {
            break;
        }
    }

    return htmlStr.slice(0, index);
}

var myString = 'This is an <b>example ' +
    '<a href="link">of a link</a> ' +
    'inside</b> another element';

console.log(truncate(myString, 23).html());
console.log(truncate(myString, 18).html());

This will output:

This is an <b>example <a href="link">of a link</a></b>
This is an <b>example <a href="link">of a link</a> inside</b>

While your library looks excellent it doesn't quite solve my problem i'm afraid. In my case, if the html string gets truncated "inside" an HTML element, I would like the split element to be returned in fill and not truncated itself. So in essence I want to be able to truncate as close as possible to the given "maxlength" but leaving all html tags fully intact and unmodified. — Gordo, Sep 04 '15 at 05:49
Thanks @Gordo I've updated my answer based on this with an example that solves nested tags. — Anthony Blackshaw, Sep 04 '15 at 08:25
Thanks... I've awarded you the answer since, although my quick and dirty function solves my problem entirely, your library and your answer will likely be far more useful for the majority. Cheers! — Gordo, Sep 04 '15 at 09:47

rfong · Answer 3 · 2016-10-21T03:41:51.240

Although HTML is notorious for being terribly formed and has edge cases which are impervious to regex, here is a super light way you could hackily handle HTML with nested tags in vanilla JS.

(function(s, approxNumChars) {
  var taggish = /<[^>]+>/g;
  var s = s.slice(0, approxNumChars); // ignores tag lengths for solution brevity
  s = s.replace(/<[^>]*$/, '');  // rm any trailing partial tags
  tags = s.match(taggish);

  // find out which tags are unmatched
  var openTagsSeen = [];
  for (tag_i in tags) {
    var tag = tags[tag_i];
    if (tag.match(/<[^>]+>/) !== null) {
      openTagsSeen.push(tag);
    }
    else {
      // quick version that assumes your HTML is correctly formatted (alas) -- else we would have to check the content inside for matches and loop through the opentags
      openTagsSeen.pop();
    }
  }

  // reverse and close unmatched tags
  openTagsSeen.reverse();
  for (tag_i in openTagsSeen) {
    s += ('<\\' + openTagsSeen[tag_i].match(/\w+/)[0] + '>');
  }
  return s + '...';
})

In a nutshell: truncate it (ignores that some chars will be invisible), regex match the tags, push open tags onto a stack, and pop off the stack as you encounter closing tags (again, assumes well-formed); then close any still-open tags at the end.

(If you want to actually get a certain number of visible characters, you can keep a running counter of how many non-tag chars you've seen so far, and stop the truncation when you fill your quota.)

DISCLAIMER: You shouldn't use this as a production solution, but if you want a super light, personal, hacky solution, this will get basic well-formed HTML.

Since it's blind and lexical, this solution misses a lot of edge cases, including tags that should not be closed, like <img>, but you can hardcode those edge cases or, you know, include a lib for a real HTML parser if you want. Fortunately, since HTML is poorly formed, you won't see it ;)

For an approximate truncation this is totally sufficient... and super light weight. Nice. — BananaNeil, Oct 21 '16 at 03:42

Gordo · Answer 4 · 2015-09-04T07:28:00.030

Thanks to T.J. Crowder I soon came to the realization that the only way to do this with any kind of efficiency is to use the native DOM methods and iterate through the elements.

I've knocked up a quick, reasonably elegant function which does the trick.

function truncate(rootNode, max){
    //Text method for cross browser compatibility
    var text = ('innerText' in rootNode)? 'innerText' : 'textContent';

    //If total length of characters is less that the limit, short circuit
    if(rootNode[text].length <= max){ return; }

    var cloneNode = rootNode.cloneNode(true),
        currentNode = cloneNode,
        //Create DOM iterator to loop only through text nodes
        ni = document.createNodeIterator(currentNode, NodeFilter.SHOW_TEXT),
        frag = document.createDocumentFragment(),
        len = 0;

    //loop through text nodes
    while (currentNode = ni.nextNode()) {
        //if nodes parent is the rootNode, then we are okay to truncate
        if (currentNode.parentNode === cloneNode) {
            //if we are in the root textNode and the character length exceeds the maximum, truncate the text, add to the fragment and break out of the loop
            if (len + currentNode[text].length > max){
                currentNode[text] = currentNode[text].substring(0, max - len);
                frag.appendChild(currentNode);
                break;
            }
            else{
                frag.appendChild(currentNode);
            }
        }
        //If not, simply add the node to the fragment
        else{
            frag.appendChild(currentNode.parentNode);
        }
        //Track current character length
        len += currentNode[text].length;
    }

    rootNode.innerHTML = '';
    rootNode.appendChild(frag);
}

This could probably be improved, but from my initial testing it is very quick, probably due to using the native DOM methods and it appears to do the job perfectly for me. I hope this helps anyone else with similar requirements.

DISCLAIMER: The above code will only deal with one level deep HTML tags, it will not deal with tags inside tags. Though it could easily be modified to do so by keeping track of the nodes parent and appending the nodes to the correct place in the fragment. As it stands, this is fine for my requirements but may not be useful to others.

How can I truncate the text contents of an Element while preserving HTML?

4 Answers4