1

I'm trying to find all videos in a piece of html:

preg_match_all('[<iframe*]', $this->textile_text, $video_matches);

I'm using PHP.

Right now I am only matching iframe tags, I need to also look for embed and object tags. Those are the only ways I can think of videos being embedded in html.

How can I say "or" so this would work?

preg_match_all('[<iframe*] OR [<embed*] OR [<object*]', $this->textile_text, $video_matches);

Also if anyone can think of a better REGEX pattern to detect videos, that would be great because mine is quite elementary.

EDIT

This produced this output:

preg_match_all("(iframe|embed|object)", $this->textile_text, $video_matches);


Array
(
    [0] => Array
        (
            [0] => object
            [1] => embed
            [2] => embed
            [3] => embed
            [4] => embed
            [5] => object
            [6] => iframe
            [7] => embed
            [8] => iframe
            [9] => iframe
            [10] => embed
            [11] => iframe
            [12] => iframe
            [13] => embed
            [14] => iframe
            [15] => iframe
            [16] => embed
            [17] => iframe
            [18] => iframe
            [19] => embed
            [20] => iframe
            [21] => iframe
            [22] => embed
            [23] => iframe
            [24] => iframe
            [25] => embed
            [26] => iframe
        )

)

Should have been:

Array
(
    [0] => Array
        (
            [0] => <iframe
            [1] => <iframe
            [2] => <iframe
            [3] => <iframe
            [4] => <iframe
            [5] => <iframe
            [6] => <iframe
            [7] => <embed        
         )

)
  • 4
    | the pipe is the standard *or* in most languages. –  Mar 01 '12 at 22:41
  • As Dagon says, the pipe, used inside a group - `/(^head|foot$)/` – Orbling Mar 01 '12 at 22:42
  • 1
    May I suggest [not parsing HTML with regex](http://stackoverflow.com/a/1732454/757830)? I'd like to suggest a DOM/XML parser, such as [DOMDocument](http://php.net/manual/en/class.domdocument.php). – gen_Eric Mar 01 '12 at 22:42
  • 1
    You should perhaps read some documentation on regular expressions first (PHP's PCRE documentation is pretty good: http://uk.php.net/manual/en/reference.pcre.pattern.syntax.php). – connec Mar 01 '12 at 22:45

1 Answers1

1

I think this will get you what you need.

(iframe|embed|object)

This should match one of those three words according to the documentation. I however do not have access to the PHP specific versions of reg-ex to give this a go.

James
  • 1,651
  • 2
  • 18
  • 24
  • hmmm this is returning nothing for me even though the html being matched against has several iframes and 1 embed tag. –  Mar 01 '12 at 22:47
  • test out the (iframe|embed|object) part on its own to see if it does the matching then. The external characters may be what is messing it up and is the section I think visual studio has pressed into me (when in doubt, escape!) hehe. If that does work I will edit the answer to just propose that part for the or matching. – James Mar 01 '12 at 22:52
  • With out the < at the start its matching opening and closing. I would also take a venture, with out seeing the HTML you are running this against that while you do not want to it find those tags embedded in each other, it would appear to be doing so. Each iFrame seems to have an inside of it. I would suggest adding in the other delimiters to limit your results to start tags again the way you were seeking to do so before. If you want to ignore embedded tags (embeds within objects) I would suggest the DOM option in the comments to your question. – James Mar 01 '12 at 23:06
  • you're right, the object tag (hulu embed code) had embed tags inside it's object tag. For now I'm just using ( –  Mar 02 '12 at 16:30