0

I have a html output that contains this:

<span class="value">
            Price:<br>
            <span style="color:white">23,07€ </span>
        </span>

I tried to extract the prices using:

prices = re.findall(r'<span class="value">.*?(\d{1,3}\.?\d{1,2}).*?</span>',search_result)

sometimes the decimals are replaced with -- when there are 00, also i need this 2 numbers that get extracted by the expression 23 07 joined 2307

Thank you for your time.

UnuSec
  • 155
  • 1
  • 9

1 Answers1

1

Get the matched group from index 1.

(?<=>)(\d[^€]*)

demo


OR get the matched group index 1 and 2 for each number

(?<=>)(\d+)\D(\d+)\D

demo


If you are interested only for <span> tag then try below regex

<span [^>]*>(\d+)\D(\d+)\D[^<]*

demo

Sample code:

import re
p = re.compile(ur'<span [^>]*>(\d+)\D(\d+)\D[^<]*')
test_str = u"..."

re.findall(p, test_str)
Braj
  • 46,415
  • 5
  • 60
  • 76
  • these second option is better since when a page is loaded in usd the prices will be 23.07, also i'm interested to scrap only the spans with that class, and the result to be 2307 to make it float and /100 to get the price. – UnuSec Jul 16 '14 at 20:22
  • yes use second one. test all the cases if passed the accept the answer by ticking green mark. – Braj Jul 16 '14 at 20:23
  • the outpu is: ('332', '15') ('0', '29') ('1', '22') ... the 1st group should not be there, can we make the span have that class? – UnuSec Jul 16 '14 at 20:38
  • yes you can add class in span if its already known. I answered as per the sample html that doesn't have class attribute. – Braj Jul 16 '14 at 20:39
  • search for prices only in this span classes – UnuSec Jul 16 '14 at 20:43
  • 1
    +1 for the multiple options and the `(?<=>)` which looks really funny. :) – zx81 Jul 16 '14 at 22:40