0

How to write the regular expression to get a floating point number in python. I want to get 55.97. from <td nowrap="nowrap">55.97</td>. So I gave

newsecond_row_data = (re.search('(?<=>)\d+|\d+.\d+',second_row_data[a]))
newsecond_row_data.group(0)

print newsecond_row_data.group(0)

but it gave 55 not 55.97. Plz hlp me

Thank you

wRAR
  • 25,009
  • 4
  • 84
  • 97
Randi
  • 639
  • 2
  • 6
  • 23

4 Answers4

7

If you want to extract data from HTML or XML take a look at the various parsers available. For this particular case, you can extract the number very easily:

>>> from xml.etree import ElementTree
>>> element = ElementTree.fromstring('<td nowrap="nowrap">55.97</td>')
>>> element.text
'55.97'
>>> 
Sudheer
  • 710
  • 6
  • 25
0
newsecond_row_data = re.search('\d+\.?\d*', second_row_data[a])
print newsecond_row_data.group(0)
Niclas Nilsson
  • 5,691
  • 3
  • 30
  • 43
0
import re

ptn = r'[-+]?([0-9]*\.?[0-9]+)'
pat_obj = re.compile(ptn)

m = pat_obj.search(some_str)
if m:
    print(m.group(0))

if you have more than one floating point per string, then use findall instead of match:

>>> s = '3dfrtg45.2trghyui8erdftgy77.431dser'

>>> pat_obj = re.compile(ptn)
>>> v = pat_obj.findall(s)
>>> v
  ['3', '45.2', '8', '77.431']
doug
  • 69,080
  • 24
  • 165
  • 199
0
newsecond_row_data = (re.search('(?<=>)\d+.\d+|\d+',second_row_data[a]))
newsecond_row_data.group(0)

The reason your pattern isn't working is because it sees '55', finds a match and stops further search.

Then again, I would advice not to use regex and use an XML processing library to extract text out of HTML tags (see Sudhir's answer).

rubayeet
  • 9,269
  • 8
  • 46
  • 55
  • Would you please be so kind to put in bold your advice for the *"I want to parse xml with regexp"* to come? – Rik Poggi Feb 09 '12 at 10:25