0

I have a string like the following

~~<b>A<i>C</i></b>~~/~~<u>D</u><b>B</b>~~has done this.

I am trying to get the text inside <b> tag. I am trying

<b>(.+)</b>

But I am getting <b>A<i>C</i></b>~~/~~<u>D</u><b>B</b>, but I need <b>A<i>C</i></b> as first match and <b>B</b> as the second match

Can anyone please help?

p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
user3306669
  • 117
  • 1
  • 11
  • Can you post the regular expression that you've already tried?... – War10ck Apr 24 '14 at 15:01
  • Thou shall not try to parse html with regular expressions. Seriously, in most cases you will end up with a pile of unmaintainable and error-prone code - just imagine Nested sections of interest (eG. ...............). Make sure that you have a **very** compelling reason to choose this path. – collapsar Apr 24 '14 at 15:09
  • The following questions in the [Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/2736496) may be of interest: [In-depth discussion on the differences between greedy versus non-greedy](http://stackoverflow.com/a/3075532) (listed under "Quantifiers"), and [Don't use regex to parse HTML](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) (under "General Information"). Please consider bookmarking the FAQ for future reference. – aliteralmind Apr 24 '14 at 15:12
  • This parsing will be under HTML5 canvas where I have to parse and create canvas text. I am doing this because I can not use html text inside canvas – user3306669 Apr 24 '14 at 15:21

1 Answers1

3

You need to use a non-greedy quantifier:

<b>(.+?)</b>

This will ensure that the match stops at the first </b> it finds.

However, I would generally recommend using a proper XML or HTML parser for this sort of thing. Regular expressions are simply not powerful enough to handle the recursive structure of XML.

p.s.w.g
  • 146,324
  • 30
  • 291
  • 331