Using Javascript DOMParser to format single string HTML to multiline tabbed HTML

Question

For homework in Automata, using DOMParser, we need to format a single string of HTML

<div class="the-best-css-class-like-ever"><div class="youtube-embed" data-oembed="{'version': '1.0', 'type': 'video', 'title': 'Amazing Nintendo Facts', 'html': '<object width=\'425\'><param name=\'movie\' value=\'http://www.youtube.com/v/M3r2XDceM6A&fs=1\'></param>}"><img src="https://www.youtube.com/yt/brand/media/image/YouTube-logo-full_color.png"><img src="https://www.youtube.com/yt/brand/media/image/YouTube-logo-full_color.png"></div><!-- asdf <img> -> --><p>Automata Rules!</p></div>

into a "tabbed", multi line HTML string

<div class="the-best-css-class-like-ever">
    <div class="youtube-embed" data-oembed="{'version': '1.0', 'type': 'video', 'title': 'Amazing Nintendo Facts', 'html': '<object width=\'425\'><param name=\'movie\' value=\'http://www.youtube.com/v/M3r2XDceM6A&fs=1\'></param>}">
        <img src="https://www.youtube.com/yt/brand/media/image/YouTube-logo-full_color.png">
        <img src="https://www.youtube.com/yt/brand/media/image/YouTube-logo-full_color.png">
    </div>
    <!-- asdf <img> -> -->
    <p>
        Automata Rules!
    </p>
</div>

I have never used Javascript, so how can I use DOMParser to achieve this task? From what I understand, DOMParser takes HTML and formats it into a tree structure with child elements. However, I've tried to step through the tree, but all I get are null and undefined values

[EDIT] Somebody in class gave me a hint to use

var parser  = new DOMParser();
var htmlDoc = parser.parseFromString(text, "text/html");
var elements = htmlDoc.body.childNodes;

[EDIT 2] I solved this by stepping down the DOM tree and getting the HTML, for that particular tag, from the nodes by taking the outerHTML values and removing the childrens' values.

element.outerHTML.replace(child.outerHTML, "");

I couldn't find another, easier way to do this. Alan's answer helped greatly, especially the Firefox dev console and debugger.

You have JavaScript homework, but you've never used JavaScript? — Scott Marcus, Nov 17 '16 at 18:49
We mostly code in java, so the first part of the homework was regex parsing in java — Lightfire228, Nov 17 '16 at 18:52
Someone should tell your instructor that Java != JavaScript. — Scott Marcus, Nov 17 '16 at 18:53
That wasn't the point. Our instructor wanted us to use Java regex as the first part because it is an automata class. He wanted us to use javascript in the second part, because he thought using the DOMParser would be "easier", because we don't have to implement "traditional string parsing." — Lightfire228, Nov 17 '16 at 18:56
i think your instructor needs to read [this post](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). — I wrestled a bear once., Nov 17 '16 at 18:59
Again, Java and JavaScript are completely unrelated. How in the world would someone expect a Java student to just inherently know how a JavaScript DOM Parser works?! That's crazy. — Scott Marcus, Nov 17 '16 at 19:00
That's not the point; he wants us to learn javascript on our own. Similar to how a company would ask you to learn their main language if you didn't already know it. **He understands the difference between Java and Javascript** — Lightfire228, Nov 17 '16 at 19:01
Have you undertaken any research, for example reading the documentation available at [MDN](https://developer.mozilla.org/en-US/), and the [`DomParser()`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) page? — David Thomas, Nov 17 '16 at 19:05
I don't understand what using some parsing library and manipulating the results has to do with automata. BTW, how did you try to step through the tree? Also, what does "regex parsing" mean? You are writing a regex engine? Anyway, if you know Java, and the problem is to parse/format HTML, then just use a Java library for that purpose. — , Nov 17 '16 at 19:06
*we don't have to implement "traditional string parsing"* Writing an HTML parser using "traditional string parsing", whatever that means, is not a homework assignment, it's a semester project. — , Nov 17 '16 at 19:09
The project was split into two parts. "Part 1 was: using regex and other parsing techniques, format the HTML as above; but you can't just regex because HTML is not a regular language. You can use any programming language you like". I used Java and it took 2 and a half hours to work out. "Part 2 was, since regex parsing HTML is difficult and prone to bugs, use Javascript and the DOMParser to achieve the same task. This should be easier". Part 2 was an afterthought, but I found it the most difficult — Lightfire228, Nov 18 '16 at 05:02

score 0 · Accepted Answer · answered Nov 17 '16 at 19:21

I think you could use it for your task. In your place, I would start with an easier example and then I would go with something harder.

e.g

<html>
   <head>
    <title>DOM Parser</title>
    <script>
       var parser = new DOMParser();
       var simpleHTML = "<div class'tst'> <p> hello </p> </div>";
      var htmlObj = parser.parseFromString(simpleHTML, "text/html"); 
    </script>
   </head>
 <body></body>
</html>

If you load that code snippet in the browser (open the console to debug), you'll see that htmlObj.children will give you an Array with each node, therefore, if you implement your solution using Recursion, you'll be able to solve your homework.

Here you can find more info about DOMParser

Using Javascript DOMParser to format single string HTML to multiline tabbed HTML

1 Answers1