0

I have a txt file which contains the following lines:

  <KEY key="Metric" keyvalue="VALUE (Base)">523.876481542546</KEY>
  <KEY key="Metric" keyvalue="VALUE (Base)">1.41186111749407E-05</KEY>

I want to extract the numbers from the above using regular expressions. The numbers may include scientific notation e.g. 1.41186111749407E-05. So far I have tried (in my python script):

    count = 0
    for i, line in enumerate(searchlines):
        if '"VALUE (Base)">' in line:
            for line in searchlines[i:i+1]:
                m = re.search(r'\d+\.\d+', line)
                count = count + 1
                if count == 1:
                    m1 = m.group()
                if count == 2:
                    m2 = m.group()

This gives an output of:

m1 = 523.876481542546
m2 = 1.41186111749407

but I want:

m2 = 1.41186111749407E-05

What is the regular expression I need to handle cases with an 'E' and a minus symbol '-' ?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
B. Dillon
  • 11
  • 2

3 Answers3

2

Why don't use an XML parser for the XML data. For example, xml.etree.ElementTree from the Python standard library:

$ cat input.xml
<KEYS>
  <KEY key="Metric" keyvalue="VALUE (Base)">523.876481542546</KEY>
  <KEY key="Metric" keyvalue="VALUE (Base)">1.41186111749407E-05</KEY>
</KEYS>

>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse("input.xml")
>>> [key.text for key in tree.findall("KEY")]
['523.876481542546', '1.41186111749407E-05']
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

This will get everything in the tags that starts with digits and has decimals.

import re
REGEX = re.compile("^<.*?>(\d*\..*)<.*?>$")

If you have numbers that don't have decimals do

import re
REGEX = re.compile("^<.*?>(\d*|\d*\..*)<.*?>$")
gr1zzly be4r
  • 2,072
  • 1
  • 18
  • 33
0

I think if you use the re string

re.search("\d+\.*\d*[E]*[-]*\d*",line)

that should do it for numbers that have decimals and those that don't have decimals

you can always test your regex expressions using a regex tester like this one: http://pythex.org/

lstbl
  • 527
  • 5
  • 17