1

i am new in Php and i wanted to learn codes for taking infos from another site. i looked preg_match and explode.

My question is i wanted to take some infos but there are not tags.

I used this code for this tag

$site=file_get_contents("$link");
$price='#<div class="price">(.*?)<\/div>#si';
preg_match_all($price,$site,$pricelist);
for ($a=0; $a<5; $a++){
echo $pricelist[1][$a];
}

But in the source code it is like this :

<b>500€</b></a><div class=gh_hl1>
<b>510€</b></small></a><br clear=all><div class=gh_hl1>
<b>520€</b></a><div class=gh_hl1>
<b>530€</b></a><div class=gh_hl1>
<b>540€</b></a><div class=gh_hl1>
<b>550€</b></a><div class=gh_hl1>

It starts with <b> this tag and it finishes with <div class=gh_hl1> and </small></a><br clear=all><div class=gh_hl1> and also there are another tags starts with <b>

I wonder that is there any possibility to take this prices?

And also i looked Simple HTML Dom Parser. But i couldn't find anything. Thanks for your answers...

Samet
  • 27
  • 2
  • 2
    possible duplicate of [How to parse and process HTML/XML with PHP?](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml-with-php) – Shiplu Mokaddim Jan 09 '13 at 23:08
  • Use rather SimpleXML or any other XML parser rather than preg_match – lupatus Jan 09 '13 at 23:08
  • Even regex is cool, `DOMDocument::loadHTML` plus `DOMXPath` are way more cool for HMTL. – hakre Jan 09 '13 at 23:16
  • i found like this : foreach($html->find('b') as $element) -- but if i use this code there are a lot of tag and it takes also this tags – Samet Jan 09 '13 at 23:16
  • Yes, use `xpath` instead. not that `->find` thingy, it's trash. – hakre Jan 09 '13 at 23:28

1 Answers1

0

Well, you could look for patterns in your HTML file. One thing that's pretty noticeable is the € sign. You could search for that. This regexp should do it:

$price='/(\d*)€(\d*)/';

And should grab prices if the € sign is before or after the amount (if you guys only ever do it after, then cut that last (\d*).

There are other similarities, like the bold tags, so you could add this for more specificity:

$price='/<b>(\d*)€(\d*)</b>/';

That's still a decently generic string though, the thing that really ties them all together is the div at the end: <div class=gh_hl1>. So you can search for that, dealing with the tag in the process, with this regexp:

$price='/<b>(\d*)€(\d*)<\/b>(<\/small>)?<\/a>(<br clear=all>)?<div class=gh_hl1>/';

That's my shot. But still that's really silly (and I'm not positive if it will work in PHP, being doing mostly Ruby lately), so let's simplify it down to:

$price='/<b>(\d*)€(\d*)<\/b>.*<\/a>.*<div class=gh_hl1>/';

Now we'll get all the tags in between. Like stated in the comments, there are a million better ways to do this and probably a parent item above the <b> tag will indicate this is a price. Look for those.

Since the major thing we want is the price between the b tags and to ensure it ends with the div with that class, we can make our regexp:

$price='/<b>(\d*)€(\d*)<\/b>.*<div class=gh_hl1>/';
hjc1710
  • 576
  • 4
  • 17
  • if there arent € sign ? for example Name
    Another Name
    how can i use it?
    – Samet Jan 09 '13 at 23:14
  • Change `(\d*)€(\d*)` to `(.*)` – hjc1710 Jan 09 '13 at 23:18
  • Updated my regexes so they would really work in PHP, that last one should match what you want, it worked for me when I replaced the euro sign with a dollar (to lazy to look up code for the Euro sign). – hjc1710 Jan 09 '13 at 23:26