RegEx on JS not replaceing

Question

i have the next code:

 textResponse = textResponse.replace(/<head>(.|\n)*?<\/head\>/img, '');
 alert("Ups, Error " + jqxhr.status + ", " + textResponse);

it is used to display an error on an ajax req, the text response contains the html of the response page error, im striping that page of unnecessary content, so i try to remove the <head> from the flowing string text:

<!DOCTYPE html>
<html>
    <head>
        <title>No hay usuario logeado</title>
        <meta name="viewport" content="width=device-width" />
        <style>
         body {font-family:"Verdana";font-weight:normal;font-size: .7em;color:black;} 
         p {font-family:"Verdana";font-weight:normal;color:black;margin-top: -5px}
         b {font-family:"Verdana";font-weight:bold;color:black;margin-top: -5px}
         H1 { font-family:"Verdana";font-weight:normal;font-size:18pt;color:red }
         H2 { font-family:"Verdana";font-weight:normal;font-size:14pt;color:maroon }
         pre {font-family:"Consolas","Lucida Console",Monospace;font-size:11pt;margin:0;padding:0.5em;line-height:14pt}
         .marker {font-weight: bold; color: black;text-decoration: none;}
         .version {color: gray;}
         .error {margin-bottom: 10px;}
         .expandable { text-decoration:underline; font-weight:bold; color:navy; cursor:hand; }
         @media screen and (max-width: 639px) {
          pre { width: 440px; overflow: auto; white-space: pre-wrap; word-wrap: break-word; }
         }
         @media screen and (max-width: 479px) {
          pre { width: 280px; }
         }
        </style>
    </head>

    <body bgcolor="white">

            <span><H1>Error de servidor en la aplicación '/HMSW'.<hr width=100% size=1 color=silver></H1>

...

but the string continues exactly the same, nothing is removed.

any idea why?

Well, one possibility is best expressed here: http://stackoverflow.com/a/1732454/1243641 — Scott Sauyet, Aug 20 '15 at 15:15
I've checked document.documentElement.innerHTML.replace(/(.|\n)*?<\/head\>/img, ''); and as far as I can see it works as expected. Probably the issue is not is regex, but in textResponse? — alexsuslin, Aug 20 '15 at 15:27

wOxxOm · Answer 1 · 2015-09-05T07:10:44.517

To include newline characters use [\s\S] ("whitespace + nonwhitespace") and don't use multiline processing as it doesn't process the input text as a whole but line by line. The global flag is superfluous since there could be only one <head>.
```
textResponse = textResponse.replace(/<head>[\s\S]*?<\/head>/i, '');
```
A better method would be to parse the response into a DOM tree and remove the head node.

The advantage is that the parser will handle correctly a possibly commented duplicate <head> or </head> (e.g. <html><head>......<!-- </head> --!>.....</head>).

An example using DOMParser which works on modern browsers:
```
var doc = new DOMParser().parseFromString(textResponse, "text/html");
doc.head.remove(); // Note: .head node is always present even if empty
```
Then the contents can be imported with document.importNode:
```
var container = document.querySelector(".container");
container.appendChild(document.importNode(doc.querySelector(".something"), true));
```
or can be extracted as html: doc.documentElement.outerHTML

P.S. The parsing stage may be skipped if XMLHttpRequest's responseType is set to document:
```
xhr = new XMLHttpRequest();
xhr.responseType = "document";
xhr.open("GET", "http://someurl");
xhr.onload = function() {
    var doc = this.responseXML;
    doc.head.remove();
    ..................
};
xhr.send();
```

score 0 · Answer 2 · answered Sep 05 '15 at 07:52

Disregarding the fact that regular expressions are not suitable to parse HTML, this case can be much easier to process if you simply find the <body> tag and closing tag and pick everything in between. Simply do 2 indexOf() and grab the content between:

var fullHTMLStr = '<html><head>blablabla</head><body bgColor="white">Body!</body></html>';
var start = fullHTMLStr.indexOf('<body'); // don't look for '>', there might be attributes
var start = fullHTMLStr.indexOf('>', start + 4) + 1; // advance past '>'
var end = fullHTMLStr.indexOf('</body', start);

var justBody = fullHTMLStr.substring(start, end);

alert(justBody);

RegEx on JS not replaceing

2 Answers2