How to extract the price from html using regex in python

Question

I have a html output that contains this:

<span class="value">
            Price:<br>
            <span style="color:white">23,07€ </span>
        </span>

I tried to extract the prices using:

prices = re.findall(r'<span class="value">.*?(\d{1,3}\.?\d{1,2}).*?</span>',search_result)

sometimes the decimals are replaced with -- when there are 00, also i need this 2 numbers that get extracted by the expression 23 07 joined 2307

Thank you for your time.

Obligatory reference: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — dimo414, Jul 16 '14 at 19:56
to join use "23"+"07" or if they are int use "%d%02d"%(23,7) — f.rodrigues, Jul 16 '14 at 21:40

Braj · Accepted Answer · 2014-07-16T20:37:53.120

1

Get the matched group from index 1.

(?<=>)(\d[^€]*)

OR get the matched group index 1 and 2 for each number

(?<=>)(\d+)\D(\d+)\D

If you are interested only for <span> tag then try below regex

<span [^>]*>(\d+)\D(\d+)\D[^<]*

Sample code:

import re
p = re.compile(ur'<span [^>]*>(\d+)\D(\d+)\D[^<]*')
test_str = u"..."

re.findall(p, test_str)

edited Jul 16 '14 at 20:37

answered Jul 16 '14 at 19:52

Braj

these second option is better since when a page is loaded in usd the prices will be 23.07, also i'm interested to scrap only the spans with that class, and the result to be 2307 to make it float and /100 to get the price. – UnuSec Jul 16 '14 at 20:22
yes use second one. test all the cases if passed the accept the answer by ticking green mark. – Braj Jul 16 '14 at 20:23
the outpu is: ('332', '15') ('0', '29') ('1', '22') ... the 1st group should not be there, can we make the span have that class? – UnuSec Jul 16 '14 at 20:38
yes you can add class in span if its already known. I answered as per the sample html that doesn't have class attribute. – Braj Jul 16 '14 at 20:39
search for prices only in this span classes – UnuSec Jul 16 '14 at 20:43
1

+1 for the multiple options and the `(?<=>)` which looks really funny. :) – zx81 Jul 16 '14 at 22:40

1 Answers1