-1

The results I am trying to extract data from are in the following format

<a href="anime/hackroots">.hack//Roots</a> <img src="images/video.png" border=0 width=16 height=16 alt="Flash video available" title="Flash video available"> (8) <br><a href="anime/hacksign">.hack//SIGN</a> 

I am trying to extract the location and name of each anime using the following code:

preg_match_all('#<a href="anime/(.*)">(.*?)</a>#', $content, $match);

However, the results stretch across many lines, with 50+ results on each line, and I am not able to fetch a single match using that method, so was wondering what I am doing wrong.

Any help would be much appreciated!

  • 3
    What you are doing wrong is using regex for HTML parsing. – AbraCadaver Apr 21 '14 at 22:01
  • 1
    Asked a thousand times, it's just that you wrote the regex to do so. Learn about ["Greediness" in PCRE](http://www.regular-expressions.info/possessive.html). Also take care which characters (esp. whitespace) the dot "`.`" matches. – hakre Apr 21 '14 at 22:02
  • `#(.*?)#s`; the [`s` modifier](http://php.net/manual/en/reference.pcre.pattern.modifiers.php). – Sam Apr 21 '14 at 22:07
  • Also, change your first `(.*)` to `(.*?)` to make it [lazy](http://www.regular-expressions.info/repeat.html#lazy). Or use a more specific character class saying "anything but `"`" (`([^"]*)`). – Sam Apr 21 '14 at 22:12

1 Answers1

0

Use lazy matcher. Set matching should also help

<?php

$string = '<a href="anime/hackroots">.hack//Roots</a> <img src="images/video.png" border=0 width=16 height=16 alt="Flash video available" title="Flash video available"> (8) <br><a href="anime/hacksign">.hack//SIGN</a>';

$pattern = '#<a href="anime/(?P<location>.*?)">(?P<description>.*?)</a>#';

$matches = null;
preg_match_all($pattern, $string, $matches, PREG_SET_ORDER);
print_r($matches);

https://eval.in/139332

mleko
  • 11,650
  • 6
  • 50
  • 71