1

TLDR

I want to convert the below code

<nav id="TableOfContents">
  <ul>
    <li><a href="#js">JS</a>
      <ul>
        <li><a href="#h3-1">H3-1</a></li>
        <li><a href="#h3-2">H3-2</a></li>
      </ul>
    </li>
    <li><a href="#python">Python</a>
      <ul>
        <li><a href="#h3-1-1">H3-1</a></li>
        <li><a href="#h3-2-1">H3-2</a></li>
      </ul>
    </li>
  </ul>
</nav>

to the following JSON format with Javascript(or Hugo)

{"t": "root", "d": 0, "v": "", "c": [
    {"t": "h", "d": 1, "v": "<a href=\"#js\">JS</a>", "c": [
            {"t": "h", "d": 2, "v": "<a href=\"#h3-1\">H3-1</a>"}, 
            {"t": "h", "d": 2, "v": "<a href=\"#h3-2\">H3-2</a>"}
        ]
    }, 
    {"t": "h", "d": 1, "v": "<a href=\"#python\">Python</a>", "c": [
            {"t": "h", "d": 2, "v": "<a href=\"#h3-1-1\">H3-1</a>"}, 
            {"t": "h", "d": 2, "v": "<a href=\"#h3-2-1\">H3-2</a>"}
        ]
     }
]}

The above is just an example, the function you provide should be able to convert any similar structure to a JSON string like the one above without any problems (In fact, the only requirement is not to use hard coding to complete)

You can output your code like below

<!DOCTYPE html>
<html>
<head>
  <script>
    (()=>{
      var input_text = `<nav id="TableOfContents">...`;
      var output = my_func(input_text);
      console.log(output); /* expected result: {"t": "root", "d": 0, "v": "", "c": [ ... */
    }
    )()

    function my_func(text) {
      /* ... */
    }
  </script>
</head>
</html>

Long Story

What I want to do?

I wanted to use a mindmap(markmap, markmap-github) on Hugo and applying that to the article (single.html) as the table of contents

Hugo already provides the TOC by TableOfContents

i.e. My.md

## JS

### H3-1

### H3-2

## Python

### H3-1

### H3-2

and then {{ .TableOfContents }} will output

<nav id="TableOfContents">
  <ul>
    <li><a href="#js">JS</a>
      <ul>
        <li><a href="#h3-1">H3-1</a></li>
        <li><a href="#h3-2">H3-2</a></li>
      </ul>
    </li>
    <li><a href="#python">Python</a>
      <ul>
        <li><a href="#h3-1-1">H3-1</a></li>
        <li><a href="#h3-2-1">H3-2</a></li>
      </ul>
    </li>
  </ul>
</nav>

however, If I use the markmap then I must provide a JSON string to it, like below,

{"t": "root", "d": 0, "v": "", "c": [
    {"t": "h", "d": 1, "v": "<a href=\"#js\">JS</a>", "c": [
            {"t": "h", "d": 2, "v": "<a href=\"#h3-1\">H3-1</a>"}, 
            {"t": "h", "d": 2, "v": "<a href=\"#h3-2\">H3-2</a>"}
        ]
    }, 
    {"t": "h", "d": 1, "v": "<a href=\"#python\">Python</a>", "c": [
            {"t": "h", "d": 2, "v": "<a href=\"#h3-1-1\">H3-1</a>"}, 
            {"t": "h", "d": 2, "v": "<a href=\"#h3-2-1\">H3-2</a>"}
        ]
     }
]}

What have I done?

I tried to do it with just Hugo's functions but failed (logically possible, but syntactically difficult to implement)

So I put my hope in javascript, but all along, my JS is to find someone else's code to modify it, and this example is a brand new example to me; I can't start (to be frank, I am a layman in JS).

The following is I implement it with Python.

from bs4 import BeautifulSoup, Tag
from typing import Union
import os
from pathlib import Path
import json


def main():
    data = """<nav id="TableOfContents">
      <ul>
        <li><a href="#js">JS</a>
          <ul>
            <li><a href="#h3-1">H3-1</a></li>
            <li><a href="#h3-2">H3-2</a></li>
          </ul>
        </li>
        <li><a href="#python">Python</a>
          <ul>
            <li><a href="#h3-1-1">H3-1</a></li>
            <li><a href="#h3-2-1">H3-2</a></li>
          </ul>
        </li>
      </ul>
    </nav>"""
    soup = BeautifulSoup(data, "lxml")

    sub_list = []
    cur_level = 0
    dict_tree = dict(t='root', d=cur_level, v='', c=sub_list)
    root_ul: Tag = soup.find('ul')
    
    toc2json(root_ul, cur_level + 1, sub_list)  # <-- core
    
    toc_json: str = json.dumps(dict_tree)
    print(toc_json)
    output_file = Path('test.temp.html')
    create_markmap_html(toc_json, output_file)
    os.startfile(output_file)


def toc2json(tag: Tag, cur_level: int, sub_list):
    for li in tag:
        if not isinstance(li, Tag):
            continue
        a: Tag = li.find('a')
        ul: Union[Tag, None] = li.find('ul')
        if ul:
            new_sub_list = []
            toc2json(ul, cur_level + 1, new_sub_list)
            cur_obj = dict(t='h', d=cur_level, v=str(a), c=new_sub_list)
        else:
            cur_obj = dict(t='h', d=cur_level, v=str(a))
        sub_list.append(cur_obj)


def create_markmap_html(json_string: str, output_file: Path):
    import jinja2
    markmap_template = jinja2.Template("""
    <!DOCTYPE html>
    <html lang=en>
    <head>
      <script src="https://d3js.org/d3.v6.min.js"></script>
      <script src="https://cdn.jsdelivr.net/npm/markmap-view@0.2.0"></script>
      <style>      
      .mindmap {
        width: 100vw;
        height: 100vh;
      }
      </style>
    </head>
    <body>
      <main>
        <svg id="mindmap-test" class="mindmap"></svg>
      </main>
      
      <script>
        (
          (e, json_data)=>{
            const{Markmap:r}=e();
            window.mm=r.create("svg#mindmap-test",null,json_data);
          }
        )(
          ()=>window.markmap,
          {{ input_json }}
         );
      </script>
    </body>
    </html>
    """)

    html_string = markmap_template.render(dict(input_json=json_string))

    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(html_string)


if __name__ == '__main__':
    main()

For the sake of this little effort, please help me out.

Carson
  • 6,105
  • 2
  • 37
  • 45
  • Dо you want convert python code to js code? – Daniil Loban Jan 05 '21 at 04:06
  • Hi @DaniilLoban, It would be nice if you could do that, but it's not necessary. The minimum requirement is to be able to convert the unordered list to JSON. (Not necessarily limited to JS, any front-end technology (even if some browsers do not support it does not matter) I can accept) – Carson Jan 05 '21 at 05:31
  • So what we have on input (a variable or a file ) and what we expect on output (i guess you want to get a file) *.json or *.htm ? Unordered list to JSON - can you explain it more? Also how do you want to run this convertor by hand on local computer as python script or on the web? – Daniil Loban Jan 05 '21 at 05:43
  • Set a variable as the input (hard coding is fine) is ok, and output can use a string. Under normal circumstances, the input is provided by Hugo, and then the code is automatically executed when the user opens the page, but that does not matter; you can write the input using hard-coded. If you are interested in this finished product and someone helps me write this answer in the future, I will release a finished page for your reference :) – Carson Jan 05 '21 at 06:46

2 Answers2

1

I will be happy to continue improving the code if necessary :)

class Parser {
  constructor(htmlExpr){
    this.result = {t: 'root', d: 0, v: "", c:[]};
    this.data = htmlExpr.split('\n').map(row => row.trim()) // to lines ?
    this.open = RegExp('^<(?<main>[^\/>]*)>')           // some open tag
    this.close = RegExp('^<\/(?<main>[^>]*)>')          // some close tag
    this.target = RegExp('(?<main><a[^>]*>[^<]*<\/a>)') // what we looking for
  }
  test(str){
    let [o, c, v] = [ // all matches
      this.open.exec(str),
      this.close.exec(str),
      this.target.exec(str)
    ];
    // extract tagNames and value
    if (o) o = o.groups.main
    if (c) c = c.groups.main
    if (v) v = v.groups.main
    return [o, c, v];
  }
  parse(){
    const parents = [];
    let level = 0;
    let lastNode = this.result;
    let length = this.data.length;
    for(let i = 0; i < length; i++){
      const [o, c, v] = this.test(this.data[i])
      if (o === 'ul') {
        level +=1;
        lastNode.c=[]
        parents.push(lastNode.c)
      } else if (c === 'ul') {
        level -=1;
        parents.pop()
      }
      if (v) {
        lastNode = {t: 'h', d: level, v}; // template
        parents[level - 1].push(lastNode) // insert to result
      }
    }
    return this.result;
  }
}

let htmlExpr = `<nav id="TableOfContents">
      <ul>
        <li><a href="#js">JS</a>
          <ul>
            <li><a href="#h3-1">H3-1</a></li>
            <li><a href="#h3-2">H3-2</a></li>
          </ul>
        </li>
        <li><a href="#python">Python</a>
          <ul>
            <li><a href="#h3-1-1">H3-1</a></li>
            <li><a href="#h3-2-1">H3-2</a></li>
          </ul>
        </li>
      </ul>
    </nav>`
const parser = new Parser(htmlExpr);
const test = parser.parse();
console.log(JSON.stringify(test, null, 2))
Carson
  • 6,105
  • 2
  • 37
  • 45
Daniil Loban
  • 4,165
  • 1
  • 14
  • 20
  • Hi @Daniil Loban, thank you very much for your kind help, your answer was cool. Unfortunately, it didn't satisfy all my needs (now I finally know why you asked me to be more clear about input and output). Still, I will choose you as the best answer because, strictly speaking, your answer did solve the problem, and you were the one who answered it first. – Carson Jan 07 '21 at 09:47
0

I found that my problem's description may not be precise enough,

causing some people to use special solutions to solve the problem.

Strictly speaking, they don't have a problem, but it doesn't apply to some of my other examples (see id=toc-compress and id=toc-double_link),

and I found the answer by myself and can satisfy all my needs,

please see as follows.

class Toc {
  constructor(node_nav){
    this.data = node_nav;
  }
  create_mind_map(svg_id, dict_data){
    let e = ()=>window.markmap
    const {Markmap:r} = e();
    window.mm = r.create("svg#"+svg_id, null, dict_data)
  }

  _get_element(ul_node, c, cur_level){

    let li_list = Array.prototype.slice.call(ul_node.childNodes).filter(node => node.nodeName === 'LI' )
    li_list.forEach(li => {
      const inner_a = li.firstElementChild;
      const value = (()=>{
        // If it contains two links (one is an internal link and the other is an external link, then the internal link is used as the primary link)
        const inner_a_copy = inner_a.cloneNode(true);  // avoid modifying the original innerText
        const outer_a = ((RegExp('<a[^>]*>[^<]*<\/a><a[^>]*>[^<]*<\/a>').exec(li.innerHTML)) != null ?
         Array.prototype.slice.call(li.childNodes).filter(node => node.nodeName === 'A' )[1] :
         undefined
         );
        if (outer_a !== undefined) {
          inner_a_copy.innerText = outer_a.innerText
        }
        return inner_a_copy.outerHTML;
      })();

      let ul = Array.prototype.slice.call(li.childNodes).filter(node => node.nodeName === 'UL' )

      if (ul.length > 0){
        let sub_list = [];
        this._get_element(ul[0], sub_list, cur_level+1)
        c.push({t: 'h', d: cur_level, v: value, c:sub_list})
      }
      else {
        c.push({t: 'h', d: cur_level, v: value})
      }
    });
  }

  convert2dict(){
    let root_ul = Array.prototype.slice.call(this.data.childNodes).filter(node => node instanceof HTMLUListElement)[0]
    const sub_c = []
    const result_dict = {t: 'root', d: 0, v: "", c:sub_c};
    const level = 1
    this._get_element(root_ul, sub_c, level);
    // console.log(result_dict)
    // console.log(JSON.stringify(result_dict, null, 2))
    return result_dict
  }
};

function getNode(n, v) {
  /* https://stackoverflow.com/a/37411738/9935654 */
  n = document.createElementNS("http://www.w3.org/2000/svg", n);
  for (var p in v)
    n.setAttributeNS(null, p.replace(/[A-Z]/g, function(m, p, o, s) { return "-" + m.toLowerCase(); }), v[p]);
  return n
};

(
  ()=>{
    let nav_html_collection = document.getElementsByTagName('nav');
    let idx = 0;
    for (let node_nav of nav_html_collection){
      const toc = new Toc(node_nav);
      const dict_data = toc.convert2dict();
      const id_name = 'mindmap' + idx.toString();
      let node_svg = getNode("svg", {id: id_name});
      node_nav.appendChild(node_svg)
      toc.create_mind_map(id_name, dict_data)
      idx += 1;
    };
  }
)();
<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <script src="https://d3js.org/d3.v6.min.js"></script>
  <script src="https://cdn.jsdelivr.net/npm/markmap-view@0.2.0"></script>
  <style>
    .mindmap {
      /*
      width: 100vw;
      height: 100vh;
      */
    }
  </style>
</head>
<body>
  <nav id="toc-normal" class="mindmap">
    <ul>
      <li><a href="#js">JS</a>
        <ul>
          <li><a href="#h3-1">H3-1</a></li>
          <li><a href="#h3-2">H3-2</a></li>
        </ul>
      </li>
      <li><a href="#python">Python</a>
        <ul>
          <li><a href="#h3-1-1">H3-1</a></li>
          <li><a href="#h3-2-1">H3-2</a></li>
        </ul>
      </li>
    </ul>
  </nav>

  <nav id="toc-compress"><ul><li><a href="#js">JS</a><ul><li><a href="#h3-1">H3-1</a></li><li><a href="#h3-2">H3-2</a></li></ul></li></ul></nav>

  <nav id="toc-double_link">
    <ul>
      <li><a href="#js"><a href="https://www.w3schools.com/js/DEFAULT.asp">JS</a></a>
        <ul>
          <li><a href="#h3-1">H3-1</a></li>
          <li><a href="#h3-2">H3-2</a></li>
        </ul>
      </li>
    </ul>
  </nav>
</body>
</html>
Carson
  • 6,105
  • 2
  • 37
  • 45