0

Im trying to do a regex where I can find all html tags, but for each one, each opening and closing tag must be the same. Heres what I mean: (Yes I only want max 3 letters)

preg_match_all("/\<[a-z]{1,3}\>(.*?)\<\/[a-z]{1,3}\>/", $string, $matches);

Where the 2 [a-z]{1,3} are, I want those to be the same, so it doesn't match <b> with <\i>, etc. Thanks... let me know if you need further explanation

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
David
  • 2,365
  • 8
  • 34
  • 59

3 Answers3

1

Don't parse HTML with regex. Use PHP Tidy instead.

Community
  • 1
  • 1
Vivin Paliath
  • 94,126
  • 40
  • 223
  • 295
  • Im not really parsing HTML, its just the closest example and easiest explanation to show what Im trying to do.. – David Aug 25 '10 at 02:58
  • So you're parsing XML? :P Sorry, whenever I see `regex` and HTML I laugh. – Nick T Aug 25 '10 at 03:02
  • It doesn't matter if you're parsing HTML/XML or if you're checking for specific closing-tags. HTML and Regex go together like gasoline and milk. i.e., not recommended. :) – Vivin Paliath Aug 25 '10 at 03:03
  • @David: If it's so much *like* HTML, could you just use an *ML parser anyways? – Nick T Aug 25 '10 at 03:04
1

you really shouldn't be parsing *ml with regex because of problems with nested elements, but if this is any help:

preg_match_all("/<([a-z]{1,3})>(.*?)<\/\1>/", $string, $matches);
bcosca
  • 17,371
  • 5
  • 40
  • 51
  • Be aware that this won't handle tags that are enclosed in the same kind of tag. For example, given ``, it will match ``. – Alan Moore Aug 25 '10 at 06:58
0

As Vivin Paliath said plus you can try to use PHP5's DomDocument with XPath

http://php.net/manual/en/class.domdocument.php

Jake N
  • 10,535
  • 11
  • 66
  • 112