How to match a part of an – Brian Campbell Sep 03 '10 at 15:28

  • 3
    @Nick Just consider this example : NotAPath"> How can regexp effectively recognize it's not an iframe ? – HoLyVieR Sep 03 '10 at 15:53
  • 3
    <?php
    $html='<iframe maybe somethin gere src="http://some.random.url.com/" and blablabla';
    
    preg_match('|<iframe [^>]*(src="[^"]+")[^>]*|', $html, $matches);
    
    var_dump($matches);
    

    Output:

    array(2) {
      [0]=>
      string(75) "<iframe maybe somethin gere src="http://some.random.url.com/" and blablabla"
      [1]=>
      string(33) "src="http://some.random.url.com/""
    }
    

    But this is a quick way to do this using regular expression, which may break with unclean html or cause problems, go for a dom parser for a good proof solution.

    aularon
    • 11,042
    • 3
    • 36
    • 41
    2

    A regular expression is going to be the cleanest way to do it:

    preg_match('<iframe.+?src="(.+?)".+?<\/iframe>', $iframe);
    
    print_r($iframe);
    
    array([0] => whole reg ex match, [1] => your src url);
    
    zmonteca
    • 2,304
    • 1
    • 26
    • 26
    • Probably downvoted because it's not recommended to parse HTML using regular expressions. – Brian Jan 07 '17 at 22:21
    2

    If youre source is well formed xml you can also use xpath to find the string.

    <?php
      $file = simplexml_load_file("file.html");
      $result = $file->xpath("//iframe[@src]/@src");
    ?>
    
    ase
    • 13,231
    • 4
    • 34
    • 46
    1

    You should use a DOM parser, but this regex would get you started if there is a reason you must use regexes

    .*(?<iframeOpening><iframe)\s[^>]*(?<iframeSrc>src=['"][^>'"]+['"]?).*
    

    It uses named capture groups by the way, here's how they work

    preg_match('/.*(?<iframeOpening><iframe)\s[^>]*src=[\'"](?<iframeSrc>[^>\'"])+[\'"]?.*/', $searchText, $groups);
    print_r($groups['iframeSrc']);
    
    CaffGeek
    • 21,856
    • 17
    • 100
    • 184
    • Sorry if i was unclear. That matches the entire iframe element, but i only want to match the SRC of the iframe. :) – qwerty Sep 03 '10 at 14:00
    • @Nike, you weren't unclear, and this doesn't match the entire iframe element, well, it does, but it includes named groups so you can retrieve the src, see my modified answer – CaffGeek Sep 03 '10 at 14:40
    • We're getting closer, but this is what it returns now: src="http://existenz.se/amedia/?typ=youtube&url=http://www.youtube.com/v/yTJpd57jLiY " marginheight="0"..... But i only want to return the actual value of the src tag (not src=".."). – qwerty Sep 03 '10 at 14:54
    • @Nike, try it now, I modified it slightly – CaffGeek Sep 03 '10 at 15:17
    • Got an error now: Warning: preg_match() [function.preg-match]: Compilation failed: nothing to repeat at offset 70 in.... – qwerty Sep 03 '10 at 15:24
    • I got it working! :) I removed the second * at the end, and now i only get the SRC of it. Is there any way to remove the src= and the quotes around the url? Thanks! – qwerty Sep 03 '10 at 15:31
    • @Nike, that extra `*` was a typo. I changed it to only return the contents of the src attribute. You did suggest you wanted the src included in your question though, which is why I had it returned. – CaffGeek Sep 03 '10 at 15:37
    • @Nike You can add a new capture group around the inside of the quotes. Change this: `[\'"][^>\'"]+[\'"]` to this: `[\'"](?[^>\'"]+)[\'"]`. But I strongly recommend against doing this, and recommend just using the DOM parser mentioned above, as you will have many bugs if you use this regex to try and extract the `src` from your `iframe` tags (for example: ` – Brian Campbell Sep 03 '10 at 15:39
    • I'm probably going to change to using a DOM parser later, but right now i know what the URL's are going to be (mostly), and i also know how the source code of the webpage looks like, so it will (hopefully) work as it should for the moment, until something changes. Thanks for the help! :) – qwerty Sep 03 '10 at 15:56
    • @Brian Campbell, I fully agree, DOM is almost always the best approach... but depending on the situation, it's not always. – CaffGeek Sep 03 '10 at 15:57
    1

    see RegEx match open tags except XHTML self-contained tags

    That said, your particular situation isn't really parsing... just string matching. Methods for that have already been enumerated before my answer here...

    Community
    • 1
    • 1
    jrharshath
    • 25,975
    • 33
    • 97
    • 127