0

I want to fetch particular data from a page like title of the product from html tags

Below is my div code from website -

    <div class="pdct-inf">
    <h2 class="h6" style="min-height:38px;height:38px;">
<a id="ctl00_cphMain_rPdctG_ctl01_hTitle" href="/whirlpool-whirlpool-direct-drive-285753a-ap3963893.html">Whirlpool Direct Drive Washer Mot...</a></h2><div class="startext">
<div itemprop="reviewRating" itemscope="" itemtype="http://schema.org/Rating" style="cursor:pointer; float:left; text-align:right;" class="page-style-stars-web-sm rating-5"></div>
<meta itemprop="worstRating" content="1"><meta itemprop="bestRating" content="5"><meta itemprop="ratingValue" content="5">&nbsp;(<a href="/whirlpool-whirlpool-direct-drive-285753a-ap3963893.html#diy">434</a>)
    </div>
    </div>

I want to fetch this text Whirlpool Direct Drive Washer Mot... in between <a>

Below is my php code -

<?php

$html = file_get_contents("http://www.programminghelp.com/");

preg_match_all(
    '/<h2><a href="(.*?)" rel="bookmark" title=".*?">(.*?)<\/a><\/h2>/s',
    $html,
    $posts, // will contain the article data
    PREG_SET_ORDER // formats data into an array of posts
);

foreach ($posts as $post) {
    $link = $post[1];
    $title = $post[2];

    echo $title . "\n";
}

echo "<p>" . count($posts) . " product found</p>\n";

?>

I need help to write regexp for above div content.

preg_match_all(
        '/<h2><a href="(.*?)" rel="bookmark" title=".*?">(.*?)<\/a><\/h2>/s',
Rahul
  • 763
  • 1
  • 12
  • 45
  • Ah, [regex and html](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/) .. Have you tried something like Simple HTML DOM Parser? – kero Mar 02 '16 at 07:54
  • the html markup that you have presented does not have links with `title` and `rel` attributes – RomanPerekhrest Mar 02 '16 at 08:00
  • So is that required? Can we fetch data from
    . Can we use class name in regexp
    – Rahul Mar 02 '16 at 08:02

2 Answers2

0

Maybe an HTML/XML parser like this will be more suitable. (regex is not suitable for parsing [X]HTML as said in the comments)

dvlahovski
  • 46
  • 4
0

If you want to use a regex for this you can try with something like this one

/<h2.*>\s*<a.* href="(.*)">(.*)<\/a>/m

You can see it working with your example in this php sandbox.

StoYan
  • 255
  • 2
  • 10
  • Thanks .. it work as expected .. But I am fetching data from url using file_get_contents(). So How can I fetch that particular data from that div. For ex http://www.appliancepartspros.com/frigidaire-parts.html. All products are in Grid view and each product title stored in that div. so I want to fetch products title – Rahul Mar 02 '16 at 09:02
  • @RahulDambare - If you want to fetch data only from divs with class="pdct-inf" then add it to the regex - just change it to something like `/class="pdct-inf".*\s*\s*(.*)<\/a>/m` – StoYan Mar 03 '16 at 13:42