-1

I have a few lines of HTML code :

<div class="container header">
  <div class="row">
    <div class="col-md-6">This is the target text</div>
  </div>
</div>

How to select everything inside the <div class="container header"> using Regular Expression ?

The end result should be :

<div class="row">
  <div class="col-md-6">This is the target text</div>
</div>
jojoi
  • 23
  • 1
  • 10
  • 1
    [Don't parse HTML with regex!](https://stackoverflow.com/a/1732454/418066) – Biffen Mar 10 '18 at 16:41
  • 1
    This is easily - and far more reliably - achieved using DOM nodes; *why* would you want to work with regular expressions to do this? What constraints are you working under? And - frankly - what solution did you come up with yourself that, presumably, failed? In what way? – David Thomas Mar 10 '18 at 16:57
  • I updated my answer and the live demo to take more cases into account. – Francesco B. Mar 10 '18 at 17:02
  • @DavidThomas... I'm trying to scrape a page using PHP. Is it easier to use DOM nodes on PHP than regex? pls tell me more about this. Thanks. – jojoi Mar 11 '18 at 03:23

2 Answers2

0

How about this?

<div class="row">[ \n\s]*<div class="col-md-6">[^<]*<\/div>[ \n\s]*<\/div>
  • [\n\s]* ensures you are only reading newspaces or spaces
  • [^<]* reads everything excluding <, which is the character you stop at
  • <\/div>[ \n\s]*<\/div> reads the two closing div tags you requested

Of course you must run it with the s option so that a dot matches newlines.

Live demo here

Francesco B.
  • 2,729
  • 4
  • 25
  • 37
0

DOMParser can be used to parse html:

var html = `
<div class="container header">
  <div class="row">
    <div class="col-md-6">This is the target text</div>
  </div>
</div>`

var innerHTML = new DOMParser().parseFromString(html, 'text/xml').firstChild.innerHTML

console.log( innerHTML )
Slai
  • 22,144
  • 5
  • 45
  • 53