What you want to do is known as (a variant of a) document outline, eg. creating a nested list from the headings of a document, honoring their hierarchy.
A simple implementation for the browser using the DOM and DOMParser APIs goes as follows (put into a HTML page and coded in ES5 for easy testing):
<!DOCTYPE html>
<html>
<head>
<title>Document outline</title>
</head>
<body>
<div id="outline"></div>
<script>
// test string wrapped in a document (and body) element
var str = "<html><body><h1>h1-1</h1><h2>h2-1</h2><h3>h3-1</h3><p>something</p><h1>h1-2</h1><h2>h2-2</h2><h3>h3-2</h3></body></html>";
// util for traversing a DOM and emit SAX startElement events
function emitSAXLikeEvents(node, handler) {
handler.startElement(node)
for (var i = 0; i < node.children.length; i++)
emitSAXLikeEvents(node.children.item(i), handler)
handler.endElement(node)
}
var outline = document.getElementById('outline')
var rank = 0
var context = outline
emitSAXLikeEvents(
(new DOMParser()).parseFromString(str, "text/html").body,
{
startElement: function(node) {
if (/h[1-6]/.test(node.localName)) {
var newRank = +node.localName.substr(1, 1)
// set context li node to append
while (newRank <= rank--)
context = context.parentNode.parentNode
rank = newRank
// create (if 1st li) or
// get (if 2nd or subsequent li) ol element
var ol
if (context.children.length > 0)
ol = context.children[0]
else {
ol = document.createElement('ol')
context.appendChild(ol)
}
// create and append li with text from
// heading element
var li = document.createElement('li')
li.appendChild(
document.createTextNode(node.innerText))
ol.appendChild(li)
context = li
}
},
endElement: function(node) {}
})
</script>
</body>
</html>
I'm first parsing your fragment into a Document
, then traverse it to create SAX-like startElement()
calls. In the startElement()
function, the rank of a heading element is checked against the rank of the most recently created list item (if any). Then a new list item is appended at the correct hierarchy level, and possibly an ol
element is created as container for it. Note the algorithm as it is won't work with "jumping" from h1
to h3
in the hierarchy, but can be easily adapted.
If you want to create an outline/table of content on node.js, the code could be made to run server-side, but requires a decent HTML parsing lib (a DOMParser polyfill for node.js, so to speak). There are also the https://github.com/h5o/h5o-js and the https://github.com/hoyois/html5outliner packages for creating outlines, though I haven't tested those. These packages supposedly can also deal with corner cases such as heading elements in iframe
and quote
elements which you generally don't want in the the outline of your document.
The topic of creating an HTML5 outline has a long history; see eg. http://html5doctor.com/computer-says-no-to-html5-document-outline/. HTML4's practice of using no sectioning roots (in HTML5 parlance) wrapper elements for sectioning and placing headings and content at the same hierarchy level is known as "flat-earth markup". SGML has the RANK
feature for dealing with H1
, H2
, etc. ranked elements, and can be made to infer omitted section
elements, thus automatically create an outline, from HTML4-like "flat earth markup" in simple cases (eg. where only section
or another single element is allowed as sectioning root).