0

I'm trying to write a page scraping script to take a currency of a site. I need some help writing the regular expression.

Here is what I have so far.

<?php

function converter(){
       // Create DOM from URL or file
       $html = file_get_contents("http://www.bloomberg.com/personal-    finance/calculators/currency-converter/");

    // Find currencies. ( using h1 to test)
        preg_match('/<h1>(.*)<\/h1>/i', $html, $title);
        $title_out = $title[1];
        echo $title_out;

}

 $foo = converter();
 echo $foo;



?>

Here is where the currencies are kept on the Bloomberg site.

site: http://www.bloomberg.com/personal-finance/calculators/currency-converter/

//<![CDATA[
      var test_obj = new Object();
      var price = new Object();
                price['ADP:CUR'] = 125.376;

What would the expression look like to get that rate? Any help would be great!!

khr055
  • 28,690
  • 16
  • 36
  • 48
Will
  • 3,004
  • 29
  • 43
  • Do not use regexes to parse HTML code ;) – m0skit0 Feb 20 '12 at 17:48
  • 1
    In his use case, that is acceptable. – Rok Kralj Feb 20 '12 at 17:49
  • possible duplicate of [How to parse HTML with PHP?](http://stackoverflow.com/questions/3650125/how-to-parse-html-with-php) – Gordon Feb 20 '12 at 17:56
  • possible duplicate of [How to implement Exchange Rates in PHP](http://stackoverflow.com/questions/1973569/how-to-implement-exchange-rate-in-php/1973823#1973823) – Gordon Feb 20 '12 at 17:56
  • Friendlier http://developer.yahoo.com/yql/console/?q=select%20%2a%20from%20yahoo.finance.xchange%20where%20pair%3D%22eurusd%2C%20gbpusd%22&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys – Alex K. Feb 20 '12 at 18:01
  • @AlexK. Thanks. I have already used that method I have just been asked to use this source as well. – Will Feb 20 '12 at 18:07

3 Answers3

3

This works for me - does it need to be more flexible? And does it need to take various whitespace - or is it alway exactly one space? (around the equal sign)

"/price\['ADP:CUR'\] = (\d+\.\d+/)"

Usage:

if(preg_match("/price\['ADP:CUR'\] = (\d+\.\d+)/", $YOUR_HTML, $m)) {
//Result is in $m[1]
} else {
//Not found
}
jack
  • 1,317
  • 1
  • 14
  • 21
  • Thanks! That is great. How would I be able to return just the value? would I just strip the characters before it? Also could I put a var in the regex so if for example I wanted to find GBP instead of ADP. I could put "/price\['$this->from:CUR'\] = \d+\.\d+/" – Will Feb 20 '12 at 17:55
  • See my updated answer :) And yes you can use variables in the regex. – jack Feb 20 '12 at 17:57
  • You're welcome. Although you should try take a look at kavisiegel's answer if you plan to use more data. – jack Feb 20 '12 at 18:09
3

there you go:

/ADP:CUR[^=]*=\s*(.*?);/i
Desolator
  • 22,411
  • 20
  • 73
  • 96
2

This returns an associate array identical to the javascript object on the bloomberg site.

<?php
$data = file_get_contents('http://www.bloomberg.com/personal-finance/calculators/currency-converter/');

$expression = '/price\\[\'(.*?)\'\\]\\s+=\\s+([+-]?\\d*\\.\\d+)(?![-+0-9\\.]);/';

preg_match_all($expression, $data, $matches);

$array = array_combine($matches[1], $matches[2]);

print_r($array);

echo $array['ADP:CUR'];// string(7) "125.376"
?>
Kavi Siegel
  • 2,964
  • 2
  • 24
  • 33