0

I have this XML:

<[Results]>
    <[Data]>
        <[div]>THIS IS HTML! <[/div]>
    <[/Data]>
<[/Results]>

What is the regular expression to get <[div]>THIS IS HTML!<[/div]>?

rid
  • 61,078
  • 31
  • 152
  • 193
duckmike
  • 1,006
  • 4
  • 16
  • 39
  • 2
    You'll find this question of great use for your needs: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Gabi Purcaru Aug 12 '11 at 17:08
  • You should not use regular expressions to parse XML. Usually you'll have an XML parser and perhaps XPath available to get the element. The XPath to get the `div` would then be `/Results/Data/div` – if one assumes the brackets are not present. You should add more context to your question such as where the script runs and if a standard library is available to you. – Augustus Kling Aug 12 '11 at 17:09
  • I second @Gabi. Don't parse XML/HTML with regular expressions. They aren't regular languages. – FishBasketGordo Aug 12 '11 at 17:09
  • @duckmike I've updated my answer, please read it. – Madara's Ghost Aug 12 '11 at 20:00

4 Answers4

2

http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

Do not parse XML with regexes. Do not.

Austin Yun
  • 489
  • 1
  • 5
  • 14
0

Try this:

<\[div\]>.+?<\[\/div\]>

Will match anything inside the div tags.

Though I am complied to tell you that that regex is NOT perfect. If you want to parse XML, you should use an XML parser.

Do read this post on the subject thoroughly.

Community
  • 1
  • 1
Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
0

If you can convert this to actual XML, instead of a string, you could use the getElementsByTagName method to find all div tags and the innerHTML(?) property (or innerText/textContent depending on what you want)

hugomg
  • 68,213
  • 24
  • 160
  • 246
-1

You should avoid catch <´s in the body if you have 2 or more DIVs. Try this:

<[div]>[^<]<[/div]>

Adilson de Almeida Jr
  • 2,761
  • 21
  • 37