0

I have an array of items like this:

<tr>
    <td class="vertTh">
        <center>
            <a href="/browse/200" title="More from this category">Video</a>
            <br />
            (
            <a href="/browse/201" title="More from this category">Movies</a>
            )
        </center>
    </td>
    <td>
        <div class="detName">
            <a href="/torrent/8036528/Life.of.Pi.2012.DVDSCR" class="detLink" title="Details for Life.of.Pi.2012.DVDSCR">Life.of.Pi.2012.DVDSCR</a>
        </div>
        <a href="magnet:?xt=urn:btih:b129c8fd1c91b00589ef8fe646f52ce10148a3c9&dn=Life.of.Pi.2012.DVDSCR&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80" title="Download this torrent using magnet">
            <img src="//static.thepiratebay.se/img/icon-magnet.gif" alt="Magnet link" />
        </a>
        <img src="//static.thepiratebay.se/img/icon_comment.gif" alt="This torrent has 68 comments." title="This torrent has 68 comments." />
        <img src="//static.thepiratebay.se/img/icon_image.gif" alt="This torrent has a cover image" title="This torrent has a cover image" />
        <a href="/user/scene4all">
            <img src="//static.thepiratebay.se/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border='0' />
        </a> <font class="detDesc">Uploaded 01-18&nbsp;17:41, Size 1.25&nbsp;GiB, ULed by
            <a class="detDesc" href="/user/scene4all/" title="Browse scene4all">scene4all</a></font> 
    </td>
    <td align="right">33981</td>
    <td align="right">18487</td>
</tr>

How to preg_match()/preg_match_all()

I tried using this pattern:

<tr>
    <td class="vertTh">
        (?P<cat>.*?)
    </td>
    <td>
        <div class="detName">
            (?P<name>.*?)
        </div>
        (?P<link>.*?)
    </td>
    <td align="right">(?P<up>.*?)</td>
    <td align="right">(?P<down>.*?)</td>
</tr>

And this code:

preg_match_all("#$pattern#s", $item, $v);
var_dump($v);

And it returns:

array(11) {
  [0]=>
  array(0) {
  }
  ["cat"]=>
  array(0) {
  }
  [1]=>
  array(0) {
  }
  ["name"]=>
  array(0) {
  }
      ...
}

Can someone help me, how to fix this code to return actual content? I think it's enough information I have provided.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
Karolis Mazukna
  • 377
  • 1
  • 6
  • 12
  • 2
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Phil Jan 20 '13 at 23:47
  • @Phil: That link doesn't help anyone. It's funny to us, but useless to the newbie. – Andy Lester Jan 21 '13 at 03:36
  • 1
    **Don't use regular expressions to parse HTML**. You cannot reliably parse HTML with regular expressions. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php.html for examples of how to properly parse HTML with PHP modules. – Andy Lester Jan 21 '13 at 03:38

1 Answers1

2

I would do it in four steps instead of one:

<?php
    preg_match_all('|category">([^<]*)</a>|isU', $html, $categories);
    preg_match('|<div class="detName">[^<]*<[^>]*>([^<]*)</a>|isU', $html, $name);
    preg_match('|<a href="(magnet:[^"]*)"|isU', $html, $link);
    preg_match_all('|<td align="right">([0-9]+)</td>|isU', $html, $up_down);
?>
Louis XIV
  • 2,224
  • 13
  • 16