Source code and preg_match

Question

i am new in Php and i wanted to learn codes for taking infos from another site. i looked preg_match and explode.

My question is i wanted to take some infos but there are not tags.

I used this code for this tag

$site=file_get_contents("$link");
$price='#<div class="price">(.*?)<\/div>#si';
preg_match_all($price,$site,$pricelist);
for ($a=0; $a<5; $a++){
echo $pricelist[1][$a];
}

But in the source code it is like this :

<b>500€</b></a><div class=gh_hl1>
<b>510€</b></small></a><br clear=all><div class=gh_hl1>
<b>520€</b></a><div class=gh_hl1>
<b>530€</b></a><div class=gh_hl1>
<b>540€</b></a><div class=gh_hl1>
<b>550€</b></a><div class=gh_hl1>

It starts with <b> this tag and it finishes with <div class=gh_hl1> and </small></a><br clear=all><div class=gh_hl1> and also there are another tags starts with <b>

I wonder that is there any possibility to take this prices?

And also i looked Simple HTML Dom Parser. But i couldn't find anything. Thanks for your answers...

possible duplicate of [How to parse and process HTML/XML with PHP?](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml-with-php) — Shiplu Mokaddim, Jan 09 '13 at 23:08
Use rather SimpleXML or any other XML parser rather than preg_match — lupatus, Jan 09 '13 at 23:08
Even regex is cool, `DOMDocument::loadHTML` plus `DOMXPath` are way more cool for HMTL. — hakre, Jan 09 '13 at 23:16
i found like this : foreach($html->find('b') as $element) -- but if i use this code there are a lot of tag and it takes also this tags — Samet, Jan 09 '13 at 23:16
Yes, use `xpath` instead. not that `->find` thingy, it's trash. — hakre, Jan 09 '13 at 23:28

hjc1710 · Accepted Answer · 2013-01-09T23:24:56.767

Well, you could look for patterns in your HTML file. One thing that's pretty noticeable is the € sign. You could search for that. This regexp should do it:

$price='/(\d*)€(\d*)/';

And should grab prices if the € sign is before or after the amount (if you guys only ever do it after, then cut that last (\d*).

There are other similarities, like the bold tags, so you could add this for more specificity:

$price='/<b>(\d*)€(\d*)</b>/';

That's still a decently generic string though, the thing that really ties them all together is the div at the end: <div class=gh_hl1>. So you can search for that, dealing with the tag in the process, with this regexp:

$price='/<b>(\d*)€(\d*)<\/b>(<\/small>)?<\/a>(<br clear=all>)?<div class=gh_hl1>/';

That's my shot. But still that's really silly (and I'm not positive if it will work in PHP, being doing mostly Ruby lately), so let's simplify it down to:

$price='/<b>(\d*)€(\d*)<\/b>.*<\/a>.*<div class=gh_hl1>/';

Now we'll get all the tags in between. Like stated in the comments, there are a million better ways to do this and probably a parent item above the <b> tag will indicate this is a price. Look for those.

Since the major thing we want is the price between the b tags and to ensure it ends with the div with that class, we can make our regexp:

$price='/<b>(\d*)€(\d*)<\/b>.*<div class=gh_hl1>/';

if there arent € sign ? for example Name
Another Name
how can i use it? — Samet, Jan 09 '13 at 23:14
Updated my regexes so they would really work in PHP, that last one should match what you want, it worked for me when I replaced the euro sign with a dollar (to lazy to look up code for the Euro sign). — hjc1710, Jan 09 '13 at 23:26

Source code and preg_match

1 Answers1