4

I am trying to execute a simple search and highlight function in Javascript that searches for a piece of text. The XHTML tag with which that piece of text occurs is also given as an argument for additional help in locating that text.

The XHTML that I am testing this function out on:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
 "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta content="application/xml+xhtml;charset=UTF-8" />
<script src="searcher.js" type="text/javascript"></script>
<script src="jquery-2.0.2.min.js"> </script>
<title>Building your resume</title>
</head>
<body id="highlightbegin">
<h1>Building your resume</h1>

<div> <input name="input" type="button" value="Highlight3" onclick="javascript:searcher('&lt;h1&gt;','Building your resume', '&lt;h1&gt;Building your resume', 'resume');" /> </div>

</body>
</html>  

The function searcher in searcher.js:

function searcher(tag, text, tagText, word) {

    //simple search.
    console.info(word + " to be searched for in " + text + " with tag text = " + tagText);

    //get old html.
    var oldHTML = document.getElementById("highlightbegin").innerHTML;

    //get regexp.
    var regexp = new RegExp(tagText, 'g');

    var match = oldHTML.match(regexp);
    console.info(text + " found " + match.length + " times.");
}  

However, executing the RegExp, match returns null. Further investigation reveals that the tag <h1>Building your resume</h1> becomes <h1 xmlns="http://www.w3.org/1999/xhtml">Building your resume</h1> which causes the match function to return null. My questions:

  1. Why is the xmlns attribute added automatically?
  2. Is there a way to prevent the attribute from being added?
  3. What tags will that attribute be added to? Is it safe to assume that it will be added to every tag?
  4. Is this a browser-specific issue or can this behavior be expected in all browsers?

EDIT:
An observation:
1. If I add the xmlns attribute to the body tag and access all content with outerHTML (var oldHTML = document.getElementById("highlightbegin").outerHTML;), its child elements do not have the xmlns attribute.
My questions:
1. Can the outerHTML element be edited (with Javascript) and replaced?
2. Is the observation above consistent (seen each time outerHTML is invoked) or is it implementation dependent?
3. Is it Javascript that adds the xmlns attribute automatically or the browser?

Sriram
  • 10,298
  • 21
  • 83
  • 136

1 Answers1

3
Why is the xmlns attribute added automatically?

Because if it didn't, the markup wouldn't be representative of the namespaces of the elements in the DOM, in which case if you wrote the string back to the DOM the elements would no longer be interpreted by the browser as HTML elements, and your page would break.

Is there a way to prevent the attribute from being added?

Not with innerHTML, if you're using an XMLDocument, which it seems you are. You could create your own serializer by walking the DOM if you wanted.

What tags will that attribute be added to? Is it safe to assume that it will be 
added to every tag?

At least every start tag that's a top level child of the element on which you're calling innerHTML, assuming that child is not in the null namespace. Plus the start tag of any descendent element in a different namespace to its parent. It wouldn't be wrong to add it to all the start tags if the browsers chose to do so.

Is this a browser-specific issue or can this behavior be expected in all browsers?

Serializing the DOM with innerHTML has traditionally varied between browsers. Although browers should do it consistently, I wouldn't rely on it.

1. Can the outerHTML element be edited (with Javascript) and replaced?

In theory yes, but it won't help. You'd have to replace the element with one in the null namespace to stop the attribute appearing on the outer element, and that would just cause outerHTML to add the attribute to the child elements (because they would then have a different namespace to their parent).

2. Is the observation above consistent (seen each time outerHTML is invoked) or is 
   it implementation dependent?

For the same reason as with innerHTML, there are places where the serialization has to add the attribute so the string can be read back in successfully, and places where it may be added it the browser wants to. There's no guarantee that it'll be consistent.

3. Is it Javascript that adds the xmlns attribute automatically or the browser?

It's the browser's in-built process of serializing the DOM to a string. The attribute usually isn't on the element in the DOM (the <html> element is the normal exception), it gets added where-ever the browser thinks is necessary as it gets converted to a string.

On a more general note, this is one reason why the experts try to discourage attempting to process HTML mark-up with regular expressions. Even with HTML as opposed to XHTML, where there's none of this namespace business to worry about, during conversion between the string and DOM forms, attributes can get added and removed, and their order can get changed in arbitrary ways with no guarantees about consistency between browser makes, or even successive versions of the same browser make.

Alohci
  • 78,296
  • 16
  • 112
  • 156
  • You're last comment is especially true. You can not rely on `innerHTML` being an exact match. http://stackoverflow.com/questions/3905219/use-javascript-to-get-raw-html-code – Daniel Gimenez Sep 12 '13 at 14:20
  • @Alohci: Thank you for your reply. I have added a few more questions to the original questions at the bottom. Can you please take a look? – Sriram Sep 13 '13 at 05:55