1

I am using regex to parse the currency in this format. I extract the price by scraping amazon.com

$current_price=$doc['#current-price']->text();

$this->rawPrice=$current_price;//Value =$9.65
$price_elements=array();
preg_match("/(\$|£)([0-9,\.]+)/", $current_price,$price_elements);
$this->price_elements=$price_elements;//Array is empty
$this->price=$price_elements[2];//NO index found warning

I can't understand why the regex is not parsing the value $9.65 . I have tried it on regexr as well and it works fine. Its output as json, so I know the price is retrieved correctly. Just don't know why it doesnt parse.

SoWhat
  • 5,564
  • 2
  • 28
  • 59
  • why scrape, when they have a free API? – TecBrat Nov 17 '13 at 03:09
  • 1
    seems so strange since i've putted on this tester: http://www.phpliveregex.com/p/245 and works fine – Guilherme Nov 17 '13 at 03:10
  • see the 4000+ votes answer here http://stackoverflow.com/questions/1732348 – TecBrat Nov 17 '13 at 03:11
  • worked fine for me in firefox' browser console, too. You sure there is no whitespace or any unprintable characters in there? – Johannes H. Nov 17 '13 at 03:12
  • What does `echo $current_price;` shows? – Jorge Campos Nov 17 '13 at 03:13
  • 1
    put a /u modifier after the regexp, what happens? – Guilherme Nov 17 '13 at 03:18
  • is `strlen($current_price)` 5? – Johannes H. Nov 17 '13 at 03:18
  • @TecBrat this question isn't about parsing html. Its about why $9.65 is not being matched by the regex. – SoWhat Nov 17 '13 at 05:39
  • I figured out the problem. The backslashes need to be escaped as per php: \$ would be \\$ and \. will be \\. This problem probably wouldn't occur if I use single quotes – SoWhat Nov 17 '13 at 05:40
  • @SomeshMukherjee I know what you're saying, this statement '...by scraping amazon.com' suggests you're getting to this point by parsing the HTML. I probably made an assumption. I supposed you could get to this point parsing the DOM, and then use regex to finish it. If, however, your `$current_price` is always in the format given, then `str_replace(array('$','£'),'',$current_price)` should do it. – TecBrat Nov 17 '13 at 11:16
  • @tecBrat actually I was trying to extract both the currency and the price from the string. i.e I needed to know what currency it was since I was using the same script for UK and US sites. It wasn't working because you need to double escape backslashes. \$ should have been \\$ – SoWhat Nov 17 '13 at 16:18

1 Answers1

1

The problem is that the backslashes used to escape $ and . need to be escaped as well

so \$ should be : \\$ and \. should be \\.

SoWhat
  • 5,564
  • 2
  • 28
  • 59