0

this is just so bang head on wall situation. this pattern works perfectly in javascript. and i have no idea what to do.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://yugioh.wikia.com/wiki/List_of_Yu-Gi-Oh!_BAM_cards'); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$chHtml = curl_exec($ch);
curl_close($ch);
$patt = '/<table class="wikitable sortable card-list">[\s\S]*?<\/table/im'; //////////////this 
preg_match($patt, $chHtml, $matches);

is the problem line

if i make it greedy

[\s\S]*

it works fine but it goes till the last

Noitidart
  • 35,443
  • 37
  • 154
  • 323
  • 2
    Parsing HTML with regexes? [That's not advisable...](http://stackoverflow.com/a/1732454/344643) – Waleed Khan Oct 05 '13 at 21:00
  • 1
    Don't use regular expressions for HTML, use an XML/HTML parser! See http://stackoverflow.com/a/1732454/838733 – nietonfir Oct 05 '13 at 21:00
  • 2
    Or if you want something helpful, see this post instead: http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – Wesley Murch Oct 05 '13 at 21:02
  • Thanks all! But its just bugging me now why on earth doesnt it work lol. I even tried using the u flag. Its just so unexplainable – Noitidart Oct 05 '13 at 21:44
  • I worked for like an hour, then read the php articles on the php official site on PCRE then I tried some more then took a nap woke up and tried more and failed. Then posted here. – Noitidart Oct 05 '13 at 21:49
  • Can you try this regex: `$patt = '~.*?
    – anubhava Oct 06 '13 at 10:27

1 Answers1

1

There is nothing wrong with the pattern, the problem is that you need a larger backtrack limit than the default.

Explaining:

In regex problems like that always check for errors using the preg_last error().

If you use it in the specific response from the site you submitted, since this is a resource problem and smaller texts do not raise the error, you will see that you are getting a PREG_BACKTRACK_LIMIT_ERROR.

Solution:

To overcome this limit you can raise it with the following in the start of your script:

ini_set ('pcre.backtrack_limit', 10000000);
Ioannis Lalopoulos
  • 1,491
  • 1
  • 12
  • 20