2

I have made a framework that generates a HTML "DOM" tree on the server, as a tree of python objects, and then spits it out as a string to be sent to the client. The way it does this is via a recursive depth-first traversal of the tree: for example a div would spit out the opening "div", spit out all it's children's html and then spit out the closing "/div".

This tree is broken down into conceptual components, as shown below:

graph http://lhy.mit.edu/media/Flow_Chart.png

This only shows the first two levels of hierarchy; the actual site has many more: for example each comment in the comment bar is a self contained component, each button on the menu bar is a self contained component. As you can see, the various components do not need to be on the same depth in the tree. What constitutes a "component" is decided by me.

What I want is the complete html string for each component (everything from the root node of that component downwards), as well as the partial HTML string for every component (The HTML of that component, minus the HTML of its children). The partial HTML of main section, for example, would be the html, head and two div tags only. The complete html of main section, on the other hand, would be every node on the page.

How would i do this? I could just find the complete HTML string of every component and sub-component, mark the boundaries of each sub-component with some string and do Regex-Removals in order to find the partial HTML string for every component, but that feels clunky and inefficient.

I could do an iterative-deepening DFS, halting at the boundary between a component and its sub-components until every node in that component has been explored. I would then have the partial HTML for every component but then i would need to do a similarly hacky Regex-Inserts to later build up the complete HTML for every component.

I could do both, but that would take two passes and would be expensive, though maybe not as expensive as the above Regex gymnastics.

I could do a priority-queue Dijkstra's, having each component be strictly higher priority than its children. It would traverse the tree in the correct order, finishing each component before moving on to its children, but i have no idea how i would get the final well-formed HTML string out of it.

The purpose of all this is so the server can intelligently and completely autonomously determine the minimal set of components on the client's page that need to change on a page-transition between two arbitrary pages.

If i create a new page on my site, I should need no more than Zero extra lines of code to have it ajax smoothly with any existing page.

But first i need to get my graph-traversing html-spewing algorithms in order. Any ideas?

Li Haoyi
  • 15,330
  • 17
  • 80
  • 137
  • Please clarify a few things. 1. How do you generate HTML from your aforementioned Python objects? Do you have a function that generates the complete HTML subtree at some node? 2. Do the abstract components you mentioned exist in code? How do they relate to your desire to send strictly minimal changes to the client? 3. What do you mean by an "encoded signature"? A hashsum? A unique ID? –  Aug 02 '11 at 03:47
  • 1) yes. right now, it's called recursively on a node-by-node basis: Each node writes its own opening tag, calls all its childrens toHTML(), writes its closing tag and sticks them together to return 2) Yes they do; The root nodes of each Component is marked out as special, so when traversing i can do whatever i want when i hit one of them. 3) By "Encoded Signature" I mean a tree, one node per Component, which contains a Hashsum of that Component's Subtree's HTML. This is used to compare two Components for equality, rather than comparing every single HTML node. – Li Haoyi Aug 02 '11 at 04:09

1 Answers1

0

I am presuming your client is Javscript code as you didn't specify anything.

Don't do anything too complicated. In particular, for the love of god don't try using regexes to work with HTML.

Is your server sending you a fully funciton HTML string? In this case, you can convert this into an actual DOM you can work with (there are many ways to do so) and then use the .innerHTML of an element to get your "complete html"s and use the .tagName to get a tag's name.

I still don't really get why you need all this complication. If you already went through the trouble of downloading the whole "new page" there isn't too much of a reason to try to change as few parts as possible - just replace averything and forget about it (the calls to the server should be the most expensive thing anyway).

If you really want to use less brute force, than you should find a way to request/be notified of only the interesting changes without having to look at everything. Then, given the part that is to be changed and the text, you just need to do something like

document.getElementById('mainCommentArea').innerHTML = newHTML;
Community
  • 1
  • 1
hugomg
  • 68,213
  • 24
  • 160
  • 246
  • You completely misunderstood. When you first load the page, the server stores an encoded signature for the component-tree that generated that page on the client. On Ajax, the client sends it back, and the server uses it to check which parts of the page changed and need to be replaced. The server sends back a map of IDs and HTML fragments. The javascript in question is just about 40 lines which loops through these and puts the fragments in the right place. What i want to do is intelligently (and automatically) generate these fragments on the server, based on the old and new signatures. – Li Haoyi Aug 01 '11 at 20:42
  • Well, I still don't undertand. This thing is too complicated :/ – hugomg Aug 01 '11 at 20:52
  • The entire problem is **how** does the server know which interesting parts have changed, without the programmer having to say so, while keeping the changes as minimal as possible? I am fully aware of the limits of regexes, how to find tags by name, and how to replace innerHTML =D – Li Haoyi Aug 01 '11 at 20:53