Need Regex for HTML in javascript

Question

I am facing a problem regarding HTML Regex. My Problem is that I am working on a HTML part and i need only those tag which contains some inner text and i need regex for that and my HTML code is

<P id=a_bib3 class=Bib_entry unselectable="on">Demo Text</P>
<P id=a_bib3 class=Bib_entry unselectable="on">&nbsp;</P>
<P id=a_bib4 class=Bib_entry unselectable="on">&nbsp;</P>
<P id=a_bib5 class=Bib_entry unselectable="on">&nbsp;</P></code>

and now I need only first P tag which contains some inner Text.

Most probably you would be more satisfied if you pass this to e.g. jQuery and filter the appropriate elements out. Regexps are not meant for parsing HTML. — pimvdb, Aug 22 '11 at 13:47
See: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Shawn Chin, Aug 22 '11 at 13:48
You already have the entire DOM loaded and processed into a tree, why flatten it to do regex queries... — Blindy, Aug 22 '11 at 13:49
Even if you have the HTML as a string, you can easily parse it traverse the resulting DOM. — Felix Kling, Aug 22 '11 at 14:03
As others have said, parsing HTML with a regex is a bad idea. You should pick a more suitable tool for the job. — jfriend00, Aug 22 '11 at 15:09

Robert · Answer 1 · 2019-03-08T03:31:19.313

1

/<[pP](\s("[^"]*"|'[^']*'|[^"'>]+)*)?>[^<]*Demo Text[^<]*<\/[pP]\s*>/

edited Mar 08 '19 at 03:31

answered Aug 22 '11 at 14:27

Robert

2,603
26
25

Problem is, you cannot be sure this is actually a tag. For example it might be part of an attribute value: `` – Robert Mar 08 '19 at 03:40

score 0 · Answer 2 · answered Aug 22 '11 at 13:51

0

/<p\s+[^>]+>[^\&]+.*?<\/p>/mis

answered Aug 22 '11 at 13:51

RolandasR

3,030
2
25
26

1

I downvoted because parsing html with regex is generally a bad idea, and it looks like this will fail if the inner text starts with `&` or if any inner attribute contains `>` – murgatroid99 Aug 22 '11 at 13:55
@murgatroid99: I confess I don't really know why one would operate on the HTML as a flat string when you already have it all parsed out into an ornate structure. However, one important reason to use regexes on HTML is when you are dealing with a fixed subset, such as in text editor when you want to do a search and replace on just one part, like just certain elements of one particular list or a table that you know holds no exceptional cases, nesting, comments, etc. Minimally people all need to know how to match `//mis` or `/]*>/i` plus `/<\/TAG>/i`, etc. But they seldom do. – tchrist Aug 22 '11 at 14:26
@tchrist I can see that regexes would be useful in the case you describe, but my downvote still stands because GameBit's regex still discludes text starting with `&`. – murgatroid99 Aug 22 '11 at 14:33
@murgatroid99: He also has an escape on the ampersand, which is curious. – tchrist Aug 22 '11 at 14:52
@tchrist I had just assumed that he was excluding text starting with `\`, but now that I look at a reference I am not even sure if that is a valid character class. I didn't mention it because I could not remember how backslashes are used in HTML. – murgatroid99 Aug 22 '11 at 15:02

Need Regex for HTML in javascript

2 Answers2