regex selection

Question

I have a string like this.

<p class='link'>try</p>bla bla</p>

I want to get only try I have tried this.
/[^<\/p>]+<\/p>/

But it doesn't work.

How can I can do this? Thanks,

Regular expressions and HTML? http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Andrew Grimm, Jan 31 '11 at 21:57
Although you certainly *can* [parse HTML with regexes](http://stackoverflow.com/questions/4284176/doubt-in-parsing-data-in-perl-where-am-i-going-wrong/4286326#4286326), you probably don’t want to if it is generic HTML. It’s fine for “captive” HTML though; just be incredibly wary of it in its “wild” state. — tchrist, Feb 01 '11 at 01:02

score 4 · Accepted Answer · answered Jan 31 '11 at 13:08

4

If that is your string, and you want the text between those p tags, then this should work...

/<p\sclass='link'>(.*?)<\/p>/

The reason yours is not working is because you are adding <\/p> to your not character range. It is not matching it literally, but checking for not each character individually.

Of course, it is mandatory I mention that there are better tools for parsing HTML fragments (such as a HTML parser.)

answered Jan 31 '11 at 13:08

alex

479,566
201
878
984

@Matti - Agreed! In fact, I'll make a bot that upvotes all answers that match `/\>[^<>]*\bHTML\b[^<>]*\bParser\b[^<>]*\` [(!)](http://en.wikipedia.org/wiki/%28!%29) – Kobi Jan 31 '11 at 13:55
1

That’s a very fragile pattern. [See here](http://stackoverflow.com/questions/4284176/doubt-in-parsing-data-in-perl-where-am-i-going-wrong/4286326#4286326) for how careful you really have to be if you are hellbent on using regexes on generic HTML. – tchrist Feb 01 '11 at 01:04
1

@tchrist It sure is, but looking at the example the OP gave, it seemed they were wanting to know why their particular regex was not working. As for the regex/HTML debate, please refer to my last sentence :) – alex Feb 01 '11 at 01:08
I just didn’t know why you used `\s` without using `\s+`, etc. – tchrist Feb 01 '11 at 01:11
@tchrist Just trying to point out the OP's obvious mistake. Your points are very valid though :) – alex Feb 01 '11 at 02:16

score 0 · Answer 2 · answered Jan 31 '11 at 13:13

0

'/<p[^>]+>([^<]+)<\/p>/'

will get you "try"

answered Jan 31 '11 at 13:13

Allen

263
1
5

score 0 · Answer 3 · answered Jan 31 '11 at 13:19

It looks like you used this block: [^<\/p>]+ intending to match anything except for . Unfortunately, that's not what it does. A [] block matches any of the characters inside. In your case, the /[^<\/p>]+ part matched try</, but it was not immediately followed by the expected , so there was no match.

Alex's solution, to use a non-greedy qualifier is how I tend to approach this sort of problem.

score 0 · Answer 4 · answered Jan 31 '11 at 14:14

0

I tried to make one less specific to any particular tag.

(<[^/]+?\s+[^>]*>[^>]*>)

this returns:

try

answered Jan 31 '11 at 14:14

Doug

563
4
10

regex selection

4 Answers4