cUrl preg_match issues

Question

Basically im trying to get it to scrape the url of the poster image but for some reason it's not. The regex is working fine in regex101 but not on the actual page itself.

My code:

<?php

    $url="http://www.imdb.com/title/tt0121955/";

    $ch2 = curl_init();
    curl_setopt ($ch2, CURLOPT_URL, $url);
    curl_setopt ($ch2, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt ($ch2, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31"); 
    curl_setopt ($ch2, CURLOPT_TIMEOUT, 60);
    curl_setopt ($ch2, CURLOPT_SSL_VERIFYHOST, false); 
    curl_setopt ($ch2, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt ($ch2, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch2, CURLOPT_REFERER, $url);
    $result = curl_exec ($ch2);
    curl_close($ch2);

    if(preg_match_all('/<td rowspan="2" id="img_primary"><div class="image"><a href="(.*)"><img alt="(.*)" title="South Park \(1997\) Poster" src="(.*)" itemprop="image" height="(.*)" width="(.*)"><\/a><\/div>/', $result, $matches) !== false) {

    foreach($matches as $match) {
        echo $match[0];
        echo $match[1];
        echo $match[2];
        echo $match[3];
    }

    }
?>

Also I did var_dump on $matches and it outputs:

array(6) { [0]=> array(0) { } [1]=> array(0) { } [2]=> array(0) { } [3]=> array(0) { } [4]=> array(0) { } [5]=> array(0) { } }

So it seems like its not finding anything but strangely it works fine on regex101

Use a proper HTML parser instead of regex. – HamZa Jul 01 '15 at 02:02 — HamZa, Jul 01 '15 at 02:02

score 0 · Answer 1 · answered Jul 01 '15 at 22:08

0

The HTML on the page doesn't match your regex. If you don't need the info, don't try to capture it with regex. Try

preg_match_all('/title="South Park \(1997\) Poster"\s*src="([^"]+)"/m', 
    $result, 
    $matches);

var_dump($matches);

And you're done. IMHO the best way to scrape pages is to use perl.

answered Jul 01 '15 at 22:08

Tom Pimienta

109
3

That wouldnt work as the title="" is different every time you load the page and if u didnt know I already have an answer, HTML Parser. thx anyway – Kyubeh2435436 Jul 01 '15 at 23:56

cUrl preg_match issues

1 Answers1