Get full (and original) text of an HTML element

Question

Edit: It looks like we identified the solution to this problem via the comments -- which is achieved by getting value of the .outerHTML property. However, it still appears that at least Firefox and Chrome "normalize" original source code when outerHTML is used. For example, outerHTML of

<div id = "a">    <!-- string is 14 characters long //-->

still returns

<div id="a">      <!-- string is 12 characters long //-->

Apparently, the problem would be considered solved if the formatting of the resulting string would match that of the original HTML source code. Ah! Why must outerHTML adjust the original value?

--- Having said this: ---

I'm looking for a solution to get full text of a clicked HTML tag.

Starting point examples (note intentional, legal but mangled formatting):

<div id = "a" style ="color: blue ;">text</div>

// Returns: div
var doc = document.getElementById("id").tagName;

// Returns: array of attribute name/value pair (without = or ")
var attrs = document.getElementById("id").attributes;

How would we go about generating the following text string, when element #a is clicked:

<div id = "a" style= "color: blue ;">

I seem to have not found a solution for this as of yet.

What's this for?

Ultimately, the goal is to determine the length in characters of the arbitrary contents of a tag. Assuming it can be edited in any way that produces acceptable HTML output. For example, the two cases below should return:

<div id=a style="color:blue">            // 28
<div id = "a" style= "color: blue ;">    // 36

Counting is the easy part. It's getting the actual string of that tag, just as it appears in the source code, that is the problem.

Regarding "normalisation" with outerHTML, it's a necessary result of how the property works, which is by serialising the related DOM fragment according to the [*HTML fragment serialization algorithm*](http://dev.w3.org/html5/spec-LC/the-end.html#html-fragment-serialization-algorithm). It may bear little resemblance to the source markup. Incidentally, "full and original" should probably not be separated by "and". — RobG, Sep 09 '15 at 00:46

score 3 · Answer 1 · answered Sep 08 '15 at 23:38

3

Have you tried this?

document.getElementById('a').outerHTML

But this doesn't work in every browser i guess

answered Sep 08 '15 at 23:38

Marcus Abrahão

696
1
8
18

2

After you use the outerHTML from any example below, your object will be a string so you can use the length property to get the character count, that should answer the second part of your question. – user2242618 Sep 08 '15 at 23:56
2

@MarcusAbrahão Good guess. Firefox normalizes the source file before displaying the page. Viewing page source in Firefox, or saving page, does not (always) reflect how the source file was originally formatted. Both examples in the question normalise to `id="a"` but differences between quoted attribute values are preserved. – traktor Sep 09 '15 at 00:08
@Traktor53, indeed! Thanks – Marcus Abrahão Sep 09 '15 at 00:10
@Traktor53 that's true. It appears Chrome normalizes the source too. Is there any solution at all that doesn't? Or is the only way to acquire non-normalized results by writing your own parser from scratch that works with the original source string? Theoretically it could be possible. Just read each tag into a linear array. And then loop and re-build the tree yourself. Thoughts? – InfiniteStack Sep 09 '15 at 00:36
2

Using *outerHTML*, the source **must** be "normalised" since it's [*generated from the DOM*](http://dev.w3.org/html5/spec-LC/apis-in-html-documents.html#outerhtml) (specificially: "*… the result of running the HTML fragment serialization algorithm…*"), not the source markup. – RobG Sep 09 '15 at 00:41
@RobG, very true. Unfortunately, that doesn't solve the problem. – InfiniteStack Sep 09 '15 at 00:44
I'm reasonably sure that knowing the clicked element (from Event data) you can walk the DOM in some manner to calculate it's the linear position of its tag in source. Not sure how much this helps, and it will fail if the DOM tree structure has been manipulated in scripts. – traktor Sep 09 '15 at 01:43
I looked at normalize method again. And it seems like it doesn't rustle a lot more than just the spaces between: id = "" (changes to id="") It doesn't remove linebreak characters, which is good. So I think the solution is to normalize the HTML first, and then use outerHTML on it. Thanks to everyone who helped. @Traktor53 – InfiniteStack Sep 09 '15 at 02:54

score 1 · Answer 2 · answered Sep 08 '15 at 23:50

1

Use outerHTML to get the full tag and then strip out everything after the open tag.

var openTag = document.getElementById("a").outerHTML.split(">")[0] + ">";

answered Sep 08 '15 at 23:50

JeredM

897
1
14
25

I like this solution. +1 for being able to recognize that I only need the text up until the closing tag > (and not full innerHTML content.) Thanks! – InfiniteStack Sep 08 '15 at 23:59
This answer relies on the element not having an attribute value with a ">" character. – RobG Sep 09 '15 at 00:44

score 0 · Answer 3 · edited May 23 '17 at 12:29

0

This seems to do what you want:

http://jsfiddle.net/abalter/c3eqnLrc/

html:

<div id="a" class="find-my-length" style="color:blue">First One</div>
<div   id="a  " class="find-my-length"   style= "color: blue ; " > Second     One </div        >

JavaScript:

$('.find-my-length').on('click', function () {
    var htmlString = $(this).prop('outerHTML');
    alert(htmlString + " has " + htmlString.length + " characters.");
});

Note: The one thing that doesn't get counted is spaces between attributes. Spaces within attributes are counted.

From: Get selected element's outer HTML

What about: prop('outerHTML')?

var outerHTML_text = $('#item-to-be-selected').prop('outerHTML');

And to set:

$('#item-to-be-selected').prop('outerHTML', outerHTML_text);

edited May 23 '17 at 12:29

Community

1
1

answered Sep 08 '15 at 23:48

abalter

9,663
17
90
145

1

This should be a comment, not an answer. jQuery isn't tagged or asked for. – RobG Sep 09 '15 at 00:43
JQuery is just JavaScript. So unless JavaScript is precluded in the answer, JQuery shouldn't be. JQuery is pretty much the standard way to use JavaScript these days. – abalter Sep 09 '15 at 18:23
1

It's considered bad etiquette to answer with a library that isn't asked for or tagged, otherwise there would be a plethora of library based answers and presumes that the OP has some idea of how to distinguish between them. Not telling the OP the library being used is starting them at an even greater disadvantage. – RobG Sep 10 '15 at 01:05

Get full (and original) text of an HTML element

3 Answers3