0

i have the next code:

 textResponse = textResponse.replace(/<head>(.|\n)*?<\/head\>/img, '');
 alert("Ups, Error " + jqxhr.status + ", " + textResponse);    

it is used to display an error on an ajax req, the text response contains the html of the response page error, im striping that page of unnecessary content, so i try to remove the <head> from the flowing string text:

<!DOCTYPE html>
<html>
    <head>
        <title>No hay usuario logeado</title>
        <meta name="viewport" content="width=device-width" />
        <style>
         body {font-family:"Verdana";font-weight:normal;font-size: .7em;color:black;} 
         p {font-family:"Verdana";font-weight:normal;color:black;margin-top: -5px}
         b {font-family:"Verdana";font-weight:bold;color:black;margin-top: -5px}
         H1 { font-family:"Verdana";font-weight:normal;font-size:18pt;color:red }
         H2 { font-family:"Verdana";font-weight:normal;font-size:14pt;color:maroon }
         pre {font-family:"Consolas","Lucida Console",Monospace;font-size:11pt;margin:0;padding:0.5em;line-height:14pt}
         .marker {font-weight: bold; color: black;text-decoration: none;}
         .version {color: gray;}
         .error {margin-bottom: 10px;}
         .expandable { text-decoration:underline; font-weight:bold; color:navy; cursor:hand; }
         @media screen and (max-width: 639px) {
          pre { width: 440px; overflow: auto; white-space: pre-wrap; word-wrap: break-word; }
         }
         @media screen and (max-width: 479px) {
          pre { width: 280px; }
         }
        </style>
    </head>

    <body bgcolor="white">

            <span><H1>Error de servidor en la aplicación '/HMSW'.<hr width=100% size=1 color=silver></H1>

...

but the string continues exactly the same, nothing is removed.

any idea why?

EricGS
  • 1,323
  • 2
  • 17
  • 42
  • Well, one possibility is best expressed here: http://stackoverflow.com/a/1732454/1243641 – Scott Sauyet Aug 20 '15 at 15:15
  • I've checked document.documentElement.innerHTML.replace(/(.|\n)*?<\/head\>/img, ''); and as far as I can see it works as expected. Probably the issue is not is regex, but in textResponse? – alexsuslin Aug 20 '15 at 15:27

2 Answers2

1
  • To include newline characters use [\s\S] ("whitespace + nonwhitespace") and don't use multiline processing as it doesn't process the input text as a whole but line by line. The global flag is superfluous since there could be only one <head>.

    textResponse = textResponse.replace(/<head>[\s\S]*?<\/head>/i, '');
    
  • A better method would be to parse the response into a DOM tree and remove the head node.

    The advantage is that the parser will handle correctly a possibly commented duplicate <head> or </head> (e.g. <html><head>......<!-- </head> --!>.....</head>).

    An example using DOMParser which works on modern browsers:

    var doc = new DOMParser().parseFromString(textResponse, "text/html");
    doc.head.remove(); // Note: .head node is always present even if empty
    

    Then the contents can be imported with document.importNode:

    var container = document.querySelector(".container");
    container.appendChild(document.importNode(doc.querySelector(".something"), true));
    

    or can be extracted as html: doc.documentElement.outerHTML

    P.S. The parsing stage may be skipped if XMLHttpRequest's responseType is set to document:

    xhr = new XMLHttpRequest();
    xhr.responseType = "document";
    xhr.open("GET", "http://someurl");
    xhr.onload = function() {
        var doc = this.responseXML;
        doc.head.remove();
        ..................
    };
    xhr.send();
    
wOxxOm
  • 65,848
  • 11
  • 132
  • 136
0

Disregarding the fact that regular expressions are not suitable to parse HTML, this case can be much easier to process if you simply find the <body> tag and closing tag and pick everything in between. Simply do 2 indexOf() and grab the content between:

var fullHTMLStr = '<html><head>blablabla</head><body bgColor="white">Body!</body></html>';
var start = fullHTMLStr.indexOf('<body'); // don't look for '>', there might be attributes
var start = fullHTMLStr.indexOf('>', start + 4) + 1; // advance past '>'
var end = fullHTMLStr.indexOf('</body', start);

var justBody = fullHTMLStr.substring(start, end);

alert(justBody);
Amit
  • 45,440
  • 9
  • 78
  • 110