-1


I need your help to remove all characters using a Javascript Regex in string HTML Document except <body></body> and whole string inside body tag.

I tried to use this but doesn't work:

var str = "<html><head><title></title></head><body>my content</body></html>"
str.replace(/[^\<body\>(.+)\<\\body\>]+/g,'');

I need the body content only, other option will be to use DOMParser:

var oParser = new DOMParser(str);
var oDOM = oParser.parseFromString(str, "text/xml");

But this throws an error parsing my string document loaded via Ajax.
Thanks in advance for your suggestions!

joseluisq
  • 508
  • 1
  • 7
  • 19

3 Answers3

1
var str = "<html><head><title></title></head><body>my content</body></html>"

str=str.match(/<(body)>[\s\S]*?<\/\1>/gi);

//also you can try this:
//str=str.match(/<(body)>.*?<\/\1>/gis);

Regular expression visualization

Debuggex Demo

Tim.Tang
  • 3,158
  • 1
  • 15
  • 18
  • 1
    @joseluisq see my updates: `str=str.match(/<(body)>[\s\S]*?<\/\1>/gi);` http://regex101.com/r/eJ6sG4/3 – Tim.Tang Aug 22 '14 at 02:55
1

You could try this code,

> var str = "<html><head><title></title></head><body>my content</body></html>"
undefined
> str.replace(/.*?(<body>.*?<\/body>).*/g, '$1');
'<body>my content</body>'

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

You can't (or at least shouldn't) do this with replace; try match instead:

var str = "<html><head><title></title></head><body>my content</body></html>"
var m = str.match(/<body>.*<\/body>/);
console.log(m[0]); //=> "<body>my content</body>"

If you have a multiline string, change the . (which does not include \n) to [\S\s] (not whitespace OR whitespace) or something similar.

tckmn
  • 57,719
  • 27
  • 114
  • 156