0

Possible Duplicate:
how to extract body contents using regexp

I have response text which is having a full page content like html,head,body.I want only the content inside the body.How to achieve this using regx.please help to achieve this.

Community
  • 1
  • 1
Raja
  • 77
  • 1
  • 10

1 Answers1

6

A DOM parser is the most reliable method for extracting data like this, but a regex can do a pretty decent job if the HTML is sane. (i.e. the text: <body or: </body does not occur inside comments, scripts, stylesheets, CDATA sections or attribute values. And the BODY element start tag attributes do not contain the: > character.) This regex captures the contents of the first innermost BODY element (should only ever be one):

var bodytext = '';
var m = text.match(/<body[^>]*>([^<]*(?:(?!<\/?body)<[^<]*)*)<\/body\s*>/i);
if (m) bodytext = m[1];

It implements Jeffrey Friedl's "Unrolling-the-Loop" efficiency technique so is quite fast.

ridgerunner
  • 33,777
  • 5
  • 57
  • 69