-1

I am facing a problem regarding HTML Regex. My Problem is that I am working on a HTML part and i need only those tag which contains some inner text and i need regex for that and my HTML code is

<P id=a_bib3 class=Bib_entry unselectable="on">Demo Text</P>
<P id=a_bib3 class=Bib_entry unselectable="on">&nbsp;</P>
<P id=a_bib4 class=Bib_entry unselectable="on">&nbsp;</P>
<P id=a_bib5 class=Bib_entry unselectable="on">&nbsp;</P></code>

and now I need only first P tag which contains some inner Text.

Mxyk
  • 10,678
  • 16
  • 57
  • 76
Anil
  • 97
  • 1
  • 2
  • 7
  • 1
    Most probably you would be more satisfied if you pass this to e.g. jQuery and filter the appropriate elements out. Regexps are not meant for parsing HTML. – pimvdb Aug 22 '11 at 13:47
  • 1
    See: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Shawn Chin Aug 22 '11 at 13:48
  • 4
    You already have the entire DOM loaded and processed into a tree, why flatten it to do regex queries... – Blindy Aug 22 '11 at 13:49
  • Even if you have the HTML as a string, you can easily parse it traverse the resulting DOM. – Felix Kling Aug 22 '11 at 14:03
  • As others have said, parsing HTML with a regex is a bad idea. You should pick a more suitable tool for the job. – jfriend00 Aug 22 '11 at 15:09

2 Answers2

1
/<[pP](\s("[^"]*"|'[^']*'|[^"'>]+)*)?>[^<]*Demo Text[^<]*<\/[pP]\s*>/
Robert
  • 2,603
  • 26
  • 25
  • Problem is, you cannot be sure this is actually a tag. For example it might be part of an attribute value: `` – Robert Mar 08 '19 at 03:40
0
/<p\s+[^>]+>[^\&]+.*?<\/p>/mis
RolandasR
  • 3,030
  • 2
  • 25
  • 26
  • 1
    I downvoted because parsing html with regex is generally a bad idea, and it looks like this will fail if the inner text starts with `&` or if any inner attribute contains `>` – murgatroid99 Aug 22 '11 at 13:55
  • @murgatroid99: I confess I don't really know why one would operate on the HTML as a flat string when you already have it all parsed out into an ornate structure. However, one important reason to use regexes on HTML is when you are dealing with a fixed subset, such as in text editor when you want to do a search and replace on just one part, like just certain elements of one particular list or a table that you know holds no exceptional cases, nesting, comments, etc. Minimally people all need to know how to match `//mis` or `/]*>/i` plus `/<\/TAG>/i`, etc. But they seldom do. – tchrist Aug 22 '11 at 14:26
  • @tchrist I can see that regexes would be useful in the case you describe, but my downvote still stands because GameBit's regex still discludes text starting with `&`. – murgatroid99 Aug 22 '11 at 14:33
  • @murgatroid99: He also has an escape on the ampersand, which is curious. – tchrist Aug 22 '11 at 14:52
  • @tchrist I had just assumed that he was excluding text starting with `\`, but now that I look at a reference I am not even sure if that is a valid character class. I didn't mention it because I could not remember how backslashes are used in HTML. – murgatroid99 Aug 22 '11 at 15:02