2

I am attempting to start a project where I can easily edit the DocDefinitions for pdfmake. I have the initial code shared on GitHub if anyone is interested in having a look: https://github.com/unluckynelson/pdfmake-generator

Here is a demo of what I have: http://powerprop.co.za/pdfmake-generator/

The idea is basically to give the user the ability to edit a TinyMCE textarea and display the results of the generated pdf on the same page, thus making visual edits very easy to do and update.

My question is: Is there any way of parsing the HTML? (generated from TinyMCE) into a Javascript object, for example a simple table would look like this:

HTML text

<html>
<table class="table table-condensed">
    <tr>
        <td>Some text
            <div>Nested Div</div>
        </td>
        <td></td>
    </tr>
</table>
</html>

Parsed as a JS object:

var obj = {
    html: {
        table: {
            classes: ["table", "table-condensed"],
            styles: [],
            tr: [
                {
                    td: {
                        classes: [],
                        styles: [],
                        text: [{"Some text "}, {
                            div: {
                                classes: [],
                                styles: [],
                                text: "Nested Div"
                            }
                        }]
                    }
                },
                {
                    td: {
                        classes: [],
                        styles: [],
                        text: []
                    }
                }
            ]
        }
    }
}
johan
  • 998
  • 6
  • 20
  • Talking about conversion... You can implement dom to object converter. But currently I see that object is incorrect – Alex Slipknot May 22 '17 at 08:50
  • I have been googling dom to object converter, html to object converter, etc... nothing comes up of any use. Do you have something specific you are referring to? link? The object is just my example of how I think it should look like... – johan May 22 '17 at 09:38
  • I can write some example with your case. But I still can't do it cause of incorrect *obj* format. Can you provide correct object? – Alex Slipknot May 22 '17 at 09:40
  • It's just an example of what I think it could look like, any object in fact could be useful as long as most of the DOM's info is inside the object in a logical manner. Then I can access what I need with loops and such.... – johan May 22 '17 at 09:43
  • Sure, but I want to see object structure that you need to – Alex Slipknot May 22 '17 at 10:01
  • well the final structure pdfmake needs looks like this: https://github.com/unluckynelson/pdfmake-generator/blob/master/invoice.docdef.json ... This generates an invoice that looks like this: http://powerprop.co.za/pdfmake-generator/print.pdf.php – johan May 22 '17 at 10:11
  • Ok, as I see format is very specific so you have to write converter with those specifications. There is no way to automatically convert dom to object without object specifications – Alex Slipknot May 22 '17 at 10:41
  • I agree, but surely there is some javascript way of accessing the DOM via object methods? I just need a javascript object representation of the DOM from there I can write the code to build the final obj the way I want – johan May 22 '17 at 11:03
  • Of course you can. See answer here: http://stackoverflow.com/questions/6280814/parsing-through-dom-get-all-children-and-values – Alex Slipknot May 22 '17 at 11:06

2 Answers2

0

You can use $.parseHTML(htmlString) method of jQuery to parse the html string into a DOM object which has very comprehensive list of nodes to play with: classes, styles, nodes, texts, children nodes etc. Of course it's not the exact JS object that you want, but why not to use a universal structure? You can recursively follow nodes, get class/style infos and even more!

Let's check out a base minimum example:

var text = '<table class="table table-condensed" border ="1">' +
           '  <tr>' +
           '    <td>Some text' +
           '      <div>Nested Div</div>' +
           '    </td>' +
           '    <td>Another text</td>' +
           '  </tr>' +
           '</table>' +
           '<span>Heeey' +
           '</span>';
           
var $log = $("#log"),
  $output = $("#output"),
  html = $.parseHTML(text);
 
// Append the parsed HTML
$output.append(html);
 
// Gather the parsed HTML's node names etc.
$.each(html, function(i, el) {
  $log.append("<li>nodeName: " + el.nodeName + "</li>");
  $log.append("<ul><li>childNodes count: " + el.childNodes.length + "</li></ul>");
  $log.append("<ul><li>classNames: " + el.className + "</li></ul>");
  $log.append("<ul><li>textContent: " + el.textContent + "</li></ul>");
  console.log(el);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<div id="output">Nodes will be here:</div>
<hr />
<ul id="log">
  <li>Initial log text.</li>
</ul>

You can see the actual node properties from Firefox console (F12) when you run the snippet by clicking each node.

Emre Piskin
  • 294
  • 5
  • 15
  • Sir, If you look into OP repo, you'll know that he's using Angular. Chaining jquery there is quite pointless. Also - tinymce contains itself an api methods to parse or serialize DOM. – kWeglinski May 27 '17 at 18:59
0

I believe there should be some API call in tinymce (you could check docs). Couldn't find that right now though.

There is for sure an API serializer to string. Which converts whole DOM structure created inside your textarea to string. Then you can either use library html-to-json https://www.npmjs.com/package/html-to-json or write your own parser which would be quite easy to write start with < sign, end with > or />, then if !/>, </ [...]> a few regexes. Main issue would be to exclude user typed (tinymce should escape them).

edit: here is the serializer https://www.tinymce.com/docs-3x//api/dom/class_tinymce.dom.Serializer.html/ serialize(node:DOMNode, args:Object):void : Serializes the specified browser DOM node into a HTML string.

kWeglinski
  • 411
  • 4
  • 14