1

I tried really hard to scrape prices from this website http://oasis.caiso.com/mrtu-oasis/prc_hub_lmp/PRC_HUB_LMP.html but because the prices have a "$" I can't seem to make it work. I tried:

<?php 
$data = file_get_contents('http://oasis.caiso.com/mrtu- oasis/prc_hub_lmp/PRC_HUB_LMP.html');

$regex = "@/$(.+?)</font>@";

preg_match($regex,$data,$match);

echo $match[1];
?>

It doesnt work! Any help would be appreciated!

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • 5
    There is a [special place in hell](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) for developers parsing HTML with regular expressions. Use a HTML parser. – ThiefMaster Jan 26 '14 at 09:46
  • ThiefMaster...after doing more research and reading your link, it sure seems like it. I am a complete noob to PHP/HTML programming. Have been learning it via simple examples and I guess jumped pre-maturely to use my limited knowledge to scrape data from the web. The poster below (Shankar's) code worked. Anyway, I hope to learn and catch up soon. – user3237202 Jan 26 '14 at 19:46

2 Answers2

1

Make use of a DOMDocument Class.

<?php
$data = file_get_contents('http://oasis.caiso.com/mrtu-oasis/prc_hub_lmp/PRC_HUB_LMP.html');
$dom = new DOMDocument;
@$dom->loadHTML($data);
foreach ($dom->getElementsByTagName('font') as $tag) {

    if(strpos($tag->nodeValue,'$')!==false)
    {
        $tag->nodeValue="$".trim(str_replace('$','',$tag->nodeValue));
        $prices[]=$tag->nodeValue;
    }
}
echo "<pre>";
print_r($prices);

OUTPUT :

Array
(
    [0] => $47.79842
    [1] => $47.9952
    [2] => $0.00
    [3] => $-0.19678
    [4] => $46.32017
    [5] => $47.9952
    [6] => $0.00
    [7] => $-1.67503
    [8] => $46.30577
    [9] => $47.9952
    [10] => $0.00
    [11] => $-1.68943
)
Shankar Narayana Damodaran
  • 68,075
  • 43
  • 96
  • 126
  • 1
    Dude thank you SO MUCH! This worked like a charm. I am a complete newbie to this and have been very slowly learning coding in PHP. I will study your code, learn it, and should reach I a point where I can modify it to my taste. Once again, thank you very much! – user3237202 Jan 26 '14 at 19:42
0

$ is a special character in regular expressions, it matches at end of line. It should be escaped as \$.

Reference: http://www.php.net/manual/en/regexp.reference.anchors.php

lanzz
  • 42,060
  • 10
  • 89
  • 98