2

I wrote the following function to linearize the text content of an arbitrary string of HTML:

html2text(html) {
    function _html2text(element, accum) {
        return Array.prototype.slice.call(element.childNodes).reduce((accum, node) => {
            return (node.nodeType === 3)
                ? `${accum} ${node.textContent}`
                : _html2text(node, accum);
        }, accum);
    }
    const div = document.createElement('div');
    div.innerHTML = html;
    return _html2text(div, '');
}

But now I am not able to transform it into a trail-recusive style so that it can be optimized. My problem is that the recursion recurs within the reduction. I will write this out as loops, but it's been killing me that I've not been able tail recurse this.


Here is my loop version:

function html2text(html) {
    var div = document.createElement('div');
    div.innerHTML = html;
    var accum = '';
    var stack = [];
    stack.push([div, 0]);
    while (stack.length !== 0) {
        var frame = stack.pop();
        var el = frame[0];
        var i = frame[1];
        for (; i < el.childNodes.length; i++) {
            var node = el.childNodes[i];
            if (node.nodeType === Node.ELEMENT_NODE) {
                stack.push([el, i+1]);
                stack.push([node, 0]);
                break;
            } else if (node.nodeType === Node.TEXT_NODE) {
                accum += ' ' + node.textContent;
            }
        }
    }
    return accum;
}

Here's the function above written with a recursive tail call, passing the stack:

function html2text(html) {                                                                                                                                                                              

    function recurse(stack, accum) {                                                                                                                                                                    
        if (!stack.length) {                                                                                                                                                                            
            return accum;                                                                                                                                                                               
        }                                                                                                                                                                                               
        var frame = stack.pop();                                                                                                                                                                        
        var el = frame[0];                                                                                                                                                                              
        var i = frame[1];                                                                                                                                                                               
        for (; i < el.childNodes.length; i++) {                                                                                                                                                         
            var node = el.childNodes[i];                                                                                                                                                                
            if (node.nodeType === Node.ELEMENT_NODE) {                                                                                                                                                  
                stack.push([el, i+1]);                                                                                                                                                                  
                stack.push([node, 0]);                                                                                                                                                                  
                break;                                                                                                                                                                                  
            } else if (node.nodeType === Node.TEXT_NODE) {                                                                                                                                              
                accum += ' ' + node.textContent;                                                                                                                                                        
            }                                                                                                                                                                                           
        }                                                                                                                                                                                               
        return recurse(stack, accum)                                                                                                                                                                    
    }                                                                                                                                                                                                   

    var div = document.createElement('div');                                                                                                                                                            
    div.innerHTML = html;                                                                                                                                                                               
    var stack = [];                                                                                                                                                                                     
    stack.push([div, 0]);                                                                                                                                                                               
    return recurse(stack, '');                                                                                                                                                                          
} 

The inner loop can be converted to another recursion, but without using immutable datastructures none of this really makes too much sense for me.

Dmitry Minkovsky
  • 36,185
  • 26
  • 116
  • 160
  • Is there an es6 tag? Since fat arrow and tail recursion are new features, it seems appropriate. – sqykly Oct 16 '15 at 00:34
  • There's no trail recursion optimization in ES6 as far as I know; that's why I'm trampolining. The fat arrow is just syntax. – Dmitry Minkovsky Oct 16 '15 at 00:35
  • 1
    Where `reduction` comes from? – plalx Oct 16 '15 at 00:41
  • @plalx the `Array#reduce()` – Dmitry Minkovsky Oct 16 '15 at 00:41
  • Section 14.6 of es6 explains the requirements and applicability of tail calls. It isn't applicable here because you aren't in strict mode (as far as I know). – sqykly Oct 16 '15 at 00:46
  • Just use an explicit `stack` like in your loop version and you can easily use tail recursion. Otherwise, you cannot – Bergi Oct 16 '15 at 00:47
  • What exactly is the problem you are trying to solve here? Is the code not performant enough? How do you think a tail-recursive "style" improves anything? Why do you think tail recursion is applicable here at all? – Bergi Oct 16 '15 at 00:48
  • Also plalx has a point, `reduction` is not defined. – sqykly Oct 16 '15 at 00:49
  • @plalx @sqykly sorry I didn't see that: it was a typo (should have been `accum`)—fixed it. – Dmitry Minkovsky Oct 16 '15 at 02:33
  • @Bergi trying to use recursion instead of a loop, for learnings' sake. And because for me going through a tree is easier to think about in terms of recursion than loops. – Dmitry Minkovsky Oct 16 '15 at 02:36
  • Your loop is going to be _much_ faster than a trampolined version. If this is just for learning, I would question if it is worth learning. ES6 has TCO in strict mode. Just use that. If you need to develop something before ES6 TCO is implemented in most browsers, you will be better off using spaghetti loops and redefining the function to use TCO after checking for compliant browsers (Firefox right now, I think). – okovko Oct 16 '15 at 02:46
  • @dimadima: Your first version already uses recursion (and is indeed much simpler than the loop). It's totally fine. Of course, it's not tail-recursion, because the recursive call is in a loop, [not a tail position](http://stackoverflow.com/a/30743541/1048572). And it won't get out of there, unless you jump through the same hoops as you did for your spaghetti loop version. – Bergi Oct 16 '15 at 03:00
  • You can always loop over the elements using the DOM instead of the stack, if you like trees. It still won't be tailcalling because you're concatenating strings - no way out of that. – sqykly Oct 16 '15 at 06:18
  • @sqykly Oh you're right—looks like TCO is part of the spec. I didn't know that. Why can't I re-write this as recursive tail calls because I am concatenating strings? `recurse(existingString + more)`? – Dmitry Minkovsky Oct 16 '15 at 13:58
  • @okovko yes, just for learning. Why question whether its worth learning? Anyway, I updated my looped version. It's not so bad. But I still want to know how to write this in recursive tail called form because as you and others have pointed out, ES6 and Babel do, indeed, provide TCO! – Dmitry Minkovsky Oct 16 '15 at 14:00
  • @Bergi updated my loop version to an actually-correct implementation. Do I really need to jump through similar hoops in a recursive version that tail calls? – Dmitry Minkovsky Oct 16 '15 at 14:02
  • 1
    Search for tail-recursive tree traversal. – plalx Oct 16 '15 at 14:04
  • @dimadima: Maybe similar, maybe not, but quite unnecessary in any case. Follow plalx' advice and read [this answer](http://stackoverflow.com/a/21205572/1048572) for your curiosity :-) – Bergi Oct 16 '15 at 14:16
  • Just one more thing since this is an exercise: I'm finding it pretty easy to learn ES6 by revising my old code to use the new features. That way, I don't have to solve new problems like this, I can concentrate on learning the spec. – sqykly Oct 16 '15 at 15:43

1 Answers1

2
function html2text(current, text = "") {
    "use strict";
    if (current === null) return text;
    var nextNode = current.nextSibling || current.parentNode.nextSibling,
        more = current.nodeType === 3 ? current.textContent : "";
    return html2text(nextNode, text + more);
}

You were right, you can do the concatenation before the tail call. I think this fits all the criteria, but ES6 is new, so please do double check.

sqykly
  • 1,586
  • 10
  • 16