0

Using regular expressions, I need to extract a multiline content of a tag, which has specific id value. How can I do this?

This is what I currently have:

<div(.|\n)*?id="${value}"(.|\n)*?>(.|\n)*?<\/div>

The problem with this is this sample:

<div id="1">test</div><div id="2">test</div>

If I want to replace id="2" using this regexp (with ${value} = 2), the whole string would get matched. This is because from the tag opening to closing I match everything until id is found, which is wrong.

How can I do this?

khernik
  • 2,059
  • 2
  • 26
  • 51
  • 1
    See [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). And `(.|\n)*?` is something that is most likely to cause a huge slowdown. – Wiktor Stribiżew Jun 09 '17 at 20:11
  • What's `${value}` suppose to be? – chris85 Jun 09 '17 at 20:12
  • Why are you using a regular expression for this, instead of using functions like `document.getElementById()`? – Barmar Jun 09 '17 at 20:13
  • Becuase it's not exacly a correct HTML, but instead some internal templating engine, which I can't parse with a HTML parser. – khernik Jun 09 '17 at 20:14
  • `${value}` is any numeric value, which I want to find as ID attribute – khernik Jun 09 '17 at 20:15
  • A simple way though is to do `
    ]*?id="2"[^>]*?>([\S\s]*?)
    `
    –  Jun 09 '17 at 20:21
  • Use a DOM parser... don't use regex for this task. A DOM parser *can* handle your invalid HTML in some cases. – Brad Jun 09 '17 at 20:28

2 Answers2

1

A fairly simple way is to use

Raw: <div(?=\s)[^>]*?\sid="2"[^>]*?>([\S\s]*?)</div>

Delimited: /<div(?=\s)[^>]*?\sid="2"[^>]*?>([\S\s]*?)<\/div>/

Use the variable in place of 2.

The content will be in group 1.

0

Change (.|\n) to [^>] so it won't match the > that ends the tag. Then it can't match across different divs.

<div\b[^>]*\bid="${value}"[^>]*>.*?<\/div>

Also, instead of using (.|\n)* to match across multiple lines, use the s modifier to the regexp. This makes . match any character, including newlines.

However, using regular expressions to parse HTML is not very robust. You should use a DOM parser.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • I can't, "Becuase it's not exacly a correct HTML, but instead some internal templating engine, which I can't parse with a HTML parser." ;) – khernik Jun 09 '17 at 20:19