-1

Using Regex, how to select whole <p>....</p> where a certain text (example, "hello world") is inside the <p>....</p>. Your kind help requested.

ConnorsFan
  • 70,558
  • 13
  • 122
  • 146
Amer Hamid
  • 145
  • 6
  • Please provide examples of what you have tried. – Web Nexus Jan 27 '19 at 12:21
  • 5
    You should avoid parsing HTML with regex. https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not – Pushpesh Kumar Rajwanshi Jan 27 '19 at 12:21
  • regex isn't built to parse HTML as HTML isn't a regular language. I think you would benefit from building a DOM element from your `

    ` tag and getting its `.textContent`

    – Nick Parsons Jan 27 '19 at 12:22
  • The only chance to do even nearly what you want to do (see other comments) is to be extremely lucky and to have boringly systematic and restricted input. So please show many examples of input and describe how complex it can get. Describe each and every possible shape and strange content your input can have. Yes, absolutely everything that could remotely happen. If that seems too much effort, then see above. Without that info, the question is too broad to be answered. – Yunnosch Jan 27 '19 at 12:24
  • So, `/

    hello world<\/p>/`?

    – Bergi Jan 27 '19 at 12:25
  • @Amer Hamid, is the answer below working for you? – jo_va Jan 28 '19 at 13:14

1 Answers1

-1

This JS regex would work, using a group to capture the paragraph content and positive lookahead to match until the first </p>, not eating the others:

/<p>\s*([\w\s]*hello world[\w\s]*)\s*(?=<\/p>)/gm

If you want to capture the <p> tags too:

/(<p>\s*[\w\s]*hello world[\w\s]*\s*(?=<\/p>)<\/p>)/gm

And if your <p> tags might have classes or spaces:

/(<\s*p[^>]*?>\s*[\w\s]*hello world[\w\s]*\s*(?=<\s*\/p\s*>)<\s*\/p\s*>)/gm

Here is an example capturing whole <p> tags:

const html = document.getElementById('demo').innerHTML;
const regex = new RegExp(/(<\s*p[^>]*?>\s*[\w\s]*hello world[\w\s]*\s*(?=<\s*\/p\s*>)<\s*\/p\s*>)/gm);
let match = regex.exec(html);
console.log('Matches:');
while (match != null) {
    console.log(match[1])
    match = regex.exec(html);
}
<div id="demo">
  <p class="p1">bla bla hello world bla</p>
  <p >hello world</p>
  <p>Paragraph not matching</p>
</div>

Here is a good online tool to test your regular expressions.

Hope that helps!

jo_va
  • 13,504
  • 3
  • 23
  • 47