0

I'm having a bit of trouble matching a string using REGEX (PHP).

We have this code:

<p style="text-align: center; ">
    <iframe height="360" src="http://example.com/videoembed/9338/" frameborder="0" width="640"></iframe></p>

We have this REGEX:

/<p.*>.*<iframe.*><\/iframe><\/p>/is

However, this is also matching ALL paragraph tags on the string - not just the ones containing the IFRAME tags. How can we only match the P tags containing IFRAME?

We also want to match this code using the same REGEX:

<p style="text-align: center;"><iframe allowfullscreen="" frameborder="0" height="360" src="http://example.com/videoembed/9718/" width="640"></iframe></p>

Notice that there are no line breaks and less whitespace (in the P tag).

How can we achieve this? I'm a little new to REGEX.

Thank you for your help in advance.

Ted Wilmont
  • 463
  • 2
  • 9
  • 20
  • 4
    You should most definitely *not* use regex for this task, but rather an XML parser, such as [XML Parser](http://php.net/manual/en/book.xml.php) or [SimpleXML](http://php.net/manual/en/book.simplexml.php), or an HTML parser such as the [DOM implementation](http://php.net/manual/en/domdocument.loadhtml.php). – rid Jan 08 '15 at 12:49
  • this may not answer your question, but it may solve your problem to stop parsing (x)html with regex. You may want to take a look at this: http://php.net/manual/de/book.simplexml.php and http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Daniele D Jan 08 '15 at 12:50
  • Further encouragement to use a HTML parser rather than a regex: **[htmlparsing.com](http://htmlparsing.com/)** – asontu Jan 08 '15 at 13:03
  • Thank you for providing your comments. I will certainly look into HTML parsing within PHP instead of regex. – Ted Wilmont Jan 11 '15 at 17:11

3 Answers3

2

Match only whitespace characters in between <p> and <iframe>:

/<p[^>]*>\s*<iframe[^>]*><\/iframe>\s*<\/p>/is

I also added exclude for > instead of any char (.).

Marek
  • 7,337
  • 1
  • 22
  • 33
0
<p.*?>.*?<iframe.*?><\/iframe><\/p>

Try this.See demo.

https://regex101.com/r/sH8aR8/30

$re = "/<p.*?>.*?<iframe.*?><\\/iframe><\\/p>/is";
$str = "<p style=\"text-align: center; \">\n <iframe height=\"360\" src=\"http://example.com/videoembed/9338/\" frameborder=\"0\" width=\"640\"></iframe></p>\n\n<p style=\"text-align: center;\"><iframe allowfullscreen=\"\" frameborder=\"0\" height=\"360\" src=\"http://example.com/videoembed/9718/\" width=\"640\"></iframe></p>";

preg_match_all($re, $str, $matches);

Just make your * greedy operators non greedy *?

vks
  • 67,027
  • 10
  • 91
  • 124
0

Use [^>]* instead of .* like:

/<p[^.]*>[^<]*<iframe[^>]*><\/iframe><\/p>/is
Vladyslav Savchenko
  • 1,282
  • 13
  • 10