0

I am having problems trying to extract the correct value from a text

I want to extract 14.50 from the following text (which is the last decimal number in the string).

string

<span class="ob-pricedetails">Price:</span> &#036;57.71<span style="color: #666666; font-size: 12px;">(&#163;37.61)</span><br/><span style="font-size: 11px; color: #000000;">Shipping (UK):</span>&#036;14.50

I have been trying to use the following regex

regex

(?<=Shipping \(UK\):<\/span>&#163;|&#036;)(.*)

which returns the following result for some strange reason

57.71<span style="color: #666666; font-size: 12px;">(&#163;37.61)</span><br/><span style="font-size: 11px; color: #000000;">Shipping (UK):</span>&#036;14.50

What am I doing wrong? any help would be appreciated.

mk_89
  • 2,692
  • 7
  • 44
  • 62
  • why don't you just use javascript to extract values from the DOM? – Don Aug 08 '12 at 18:16
  • Greediness. Use `.*?` or the `/U` flag. Better yet make the match more specific. – mario Aug 08 '12 at 18:17
  • @Blaine because i am scraping a website... – mk_89 Aug 08 '12 at 18:17
  • 1
    Oh look mk_89 is attempting to parse HTML using regex - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags - not a good idea... Better to use a parser (http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) for this. – Buggabill Aug 08 '12 at 18:19
  • @Buggabill this looks interesting, I'll have a look at it – mk_89 Aug 08 '12 at 18:21
  • If you are scraping a site, it is all the more reason to do so... You are asking for trouble if they change things. – Buggabill Aug 08 '12 at 18:22
  • @Buggabill yeh I know what you mean it can get messy, I just had no idea that there was a simpler way – mk_89 Aug 08 '12 at 18:23
  • You should convert the string to HTML and then use the DOM parser: http://stackoverflow.com/questions/3627489/php-parse-html-code – akirilov Aug 08 '12 at 18:34

2 Answers2

2

This would work:

preg_match('/Shipping \(UK\):<\/span>&#036;([0-9]+\.[0-9]+)/', $html, $matches);

Of course you should listen to everyone suggesting yo use a DOM parser instead of regular expressions.

Tchoupi
  • 14,560
  • 5
  • 37
  • 71
1

This should do the trick for you.

[0-9\.]+$
Mike Perrenoud
  • 66,820
  • 29
  • 157
  • 232