3

I have a string like this.

<p class='link'>try</p>bla bla</p>

I want to get only <p class='link'>try</p> I have tried this.
/<p class='link'>[^<\/p>]+<\/p>/

But it doesn't work.

How can I can do this? Thanks,

Copas
  • 5,921
  • 5
  • 29
  • 43
Luca Romagnoli
  • 12,145
  • 30
  • 95
  • 157
  • 1
    Regular expressions and HTML? http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Andrew Grimm Jan 31 '11 at 21:57
  • Although you certainly *can* [parse HTML with regexes](http://stackoverflow.com/questions/4284176/doubt-in-parsing-data-in-perl-where-am-i-going-wrong/4286326#4286326), you probably don’t want to if it is generic HTML. It’s fine for “captive” HTML though; just be incredibly wary of it in its “wild” state. – tchrist Feb 01 '11 at 01:02

4 Answers4

4

If that is your string, and you want the text between those p tags, then this should work...

/<p\sclass='link'>(.*?)<\/p>/

The reason yours is not working is because you are adding <\/p> to your not character range. It is not matching it literally, but checking for not each character individually.

Of course, it is mandatory I mention that there are better tools for parsing HTML fragments (such as a HTML parser.)

alex
  • 479,566
  • 201
  • 878
  • 984
  • @Matti - Agreed! In fact, I'll make a bot that upvotes all answers that match `/\>[^<>]*\bHTML\b[^<>]*\bParser\b[^<>]*\` [(!)](http://en.wikipedia.org/wiki/%28!%29) – Kobi Jan 31 '11 at 13:55
  • 1
    That’s a very fragile pattern. [See here](http://stackoverflow.com/questions/4284176/doubt-in-parsing-data-in-perl-where-am-i-going-wrong/4286326#4286326) for how careful you really have to be if you are hellbent on using regexes on generic HTML. – tchrist Feb 01 '11 at 01:04
  • 1
    @tchrist It sure is, but looking at the example the OP gave, it seemed they were wanting to know why their particular regex was not working. As for the regex/HTML debate, please refer to my last sentence :) – alex Feb 01 '11 at 01:08
  • I just didn’t know why you used `\s` without using `\s+`, etc. – tchrist Feb 01 '11 at 01:11
  • @tchrist Just trying to point out the OP's obvious mistake. Your points are very valid though :) – alex Feb 01 '11 at 02:16
0
'/<p[^>]+>([^<]+)<\/p>/'

will get you "try"

Allen
  • 263
  • 1
  • 5
0

It looks like you used this block: [^<\/p>]+ intending to match anything except for </p>. Unfortunately, that's not what it does. A [] block matches any of the characters inside. In your case, the /<p class='link'>[^<\/p>]+ part matched <p class='link'>try</, but it was not immediately followed by the expected </p>, so there was no match.

Alex's solution, to use a non-greedy qualifier is how I tend to approach this sort of problem.

Ray
  • 4,829
  • 4
  • 28
  • 55
0

I tried to make one less specific to any particular tag.

(<[^/]+?\s+[^>]*>[^>]*>)

this returns:

<p class='link'>try</p>

Doug
  • 563
  • 4
  • 10