How to get the static, original HTML source via JavaScript?

Question

While developing a tool (which I don't consider important detailing here, on the question, given that I was able to develop the MCVE's below), I noticed that, at least in the Chrome and Firefox versions that I have on my desktop, the string I get from the innerHTML attribute is not equal to the original source code I wrote statically on the HTML file.

console.log(document.querySelector("div").innerHTML);
/*
  <table>
    <tbody><tr>
      <td>Hello</td>
      <td>World</td>
    </tr>
  </tbody></table>
*/

<div>
  <table>
    <tr>
      <td>Hello</td>
      <td>World</td>
    </tr>
  </table>
</div>

As you may have noticed, a spontaneous <tbody> tag (which I have not added to my HTML source!) came out, aparently due to preprocessing some time in between the page download and the page onload event. In this particular case, for my application purposes, this modification doesn't generate an error and could thus be ignored.

Turns out that, in certain cases, this sort of alteration can be catastrophic, specially when all the markup is removed, like in the example below.

console.log(document.querySelector("div").innerHTML);
/*
  Hello
  World
*/

<div>
  <td>Hello</td>
  <td>World</td>
</div>

Obviously, in this case the original markup has issues, but in my application, "misuses" (like a <td> inside a <div>) are accepted. What is not accepted is the innerHTML being left with no HTML markup at all, which leads to the main question: how can I get the original, statically coded HTML markup for the <div> element?

Also, if possible, it would also be nice to know why and how this phenomenon occurs, because I'm curious :D

You ma want to look here though it may not be the answer to your question : http://stackoverflow.com/questions/938083/why-do-browsers-insert-tbody-element-into-table-elements — deepakborania, Nov 26 '14 at 19:34
The link is related but still doesn't solve my problem... Anyways, this was informative :D thank you! — Rui Pimentel, Nov 26 '14 at 19:41
The attempted misuse of `td` as child of `div` does *not* work. You cannot style the `td` elements or access them in a script, simply because they do not exist—the `` and `` tags are just ignored. — Jukka K. Korpela, Nov 26 '14 at 19:44
You ask a valid question, but I wonder if you have a valid use case where this could actually cause a problem in your code. When you do, I expect the fix will be simple and obvious -- but trying to anticipate all the possible problems with a flexible syntax prototyping tool will be neither simple nor obvious, and likely a huge waste of time. As Jukka pointed out, your second example is not exactly a valid use case. — wwwmarty, Nov 26 '14 at 19:51
Yes... you're both right, it's not valid in vanilla HTML. But what my tool does is really filling this gap left by the lack of `` and `` tags, in this example, by inserting those `
`'s at runtime in an attempt to reduce the the markup complexity. It's an internal solution for page prototyping, and it is already working reallly good, actually, but I want to improve it by removing this barrier. — Rui Pimentel, Nov 26 '14 at 19:53

score 6 · Accepted Answer · answered Nov 26 '14 at 19:38

6

The browser downloads the HTML source and parses it into a DOM (document object model). Any issues are fixed as good as possible, and elements that can be omitted in the source might be added in the DOM.

From that moment on, this memory structure is used to render the page, and it is this structure as well what you refer to in JavaScript. So if you request the innerHTML of an element, you just get a piece of HTML source code that is rendered based on the DOM. The original source is not available at all in JavaScript.

So, that's the reason why it happens. And also there is not much you can do about it. I think the only workaround is to re-load the entire page using AJAX into a string and get the required piece of source yourself.

But a better solution, obviously, would be to remove those "misuses" and make your HTML source valid. If you just need to enclose some information in the page to be used by JavaScript alone, you might choose to embed a script tag that initializes a couple of variables with those values, rather than generating some invalid HTML.

answered Nov 26 '14 at 19:38

GolezTrol

114,394
18
182
210

Thanks for the response! Mmm well, I definitely agree with you, but the background information I chose to ommit makes things a little more complicated: this is an internal (workplace only) tool to allow for faster HTML prototyping. The idea is to allow a second possible syntax, more flexible than HTML's default syntax, to mark elements on the page. It sounds stupid but was going pretty good (I already have 10+ page prototypes built with this tool), until I got stuck with this barrier. One more thing: it's meant to run offline, so no AJAX will work, unfortunately :/ – Rui Pimentel Nov 26 '14 at 19:46
1

You can do your own pre-rendering server side, so the HTML that is sent to the browser is actually valid. Think of it as markup language like `[bbcode]` on forums or even the backticks and asterisks you can use on StackOverflow to mark text as code or make it bold or italic. In the same way, you can correct actual HTML on the server, or expand your own fantasy-tags into actual HTML. As long as you send valid HTML to the browser, because the browser just isn't that forgiving and will remove/ignore everything it doesn't understand. – GolezTrol Nov 26 '14 at 19:53
Well... this was actually inspiring. As an alternative solution to my original problem, I could do almost as you say, only client-side, still: have BB-code whenever a HTML content is intended. For example, serve a `[td]` inside a `[div]` (which, as far as the browser is concerned, means *nothing* but plain, simple text). JavaScript could, then, convert the BB-like code to HTML, filling the ommited necessary gaps in between) :D **thank you**! If any other simpler solution comes in a couple days, yours will be chosen :) – Rui Pimentel Nov 26 '14 at 20:01
"The original [HTML] source is not available at all in JavaScript" isn't quite right. You can access *some* of the original HTML, but only the attributes. https://javascript.info/dom-attributes-and-properties#html-attributes – Bennett Brown Jan 19 '19 at 18:24

score 1 · Answer 2 · answered Nov 26 '14 at 20:02

1

I've tried to do something like this at work before. In some of my solutions I've structured a table, with table rows around the table data elements that I want to use, just so I can use the table datas. If you want to do a little more processing on the javascript side of things, you could potentially do something like this:

<div>
    <div class="td">Hello</div>
    <div class="td">World</div>
</div>

And then you could process this with javascript to turn the div.td's into actual td's. Just an idea.

answered Nov 26 '14 at 20:02

The Real Diel

86
4

That works, also :) thanks for your answer! The only problem I can think of is the heavier syntax needed, what kind of destroys the purpose of my tool, which is supposed to make use of a very light, easy "dialect" with custom "elements" (actually, just templates with lots of regular HTML elements inside). – Rui Pimentel Nov 26 '14 at 20:11

How to get the static, original HTML source via JavaScript?

2 Answers2

Linked