0

I have a file that contains strings like this :

N7300 X-7.254 Y-40.839 A-89.74

N7301 X-7.002 Y-40.847 A-89.806

N7302 X-6.75 Y-40.855 A-89.873

N7303 X-6.511 Y-40.862 A-89.926

N7304 X-6.272 Y-40.868 A-89.979

The bold strings has negative numbers. I dont know how to read these numbers from the file.

I want to generate output like this :

[('N','7300'),('X','-7.254'), ('Y','-40.839') . . .]

import re
import sys
with open(r'/home/ruchir/Desktop/NewFolder/TEST.txt') as f:
    lines=f.readlines()
    lines=''.join(lines)
    lines=lines.split()
    a=[]

    for i in lines:
        match=re.match(r"([a-z]+)([0-9]*\.?[0-9]+)",i,re.I)
        if match:
            a.append(match.groups())
            print a

I wrote this program that works fine but not for negative integers..!! Plzz help me, I'm new in Python...

EdChum
  • 376,765
  • 198
  • 813
  • 562
Ruchir
  • 1,086
  • 4
  • 24
  • 48
  • 3
    Add `-` to your regex? I can't believe you wrote the regex but don't know how to extend it. – sashoalm Apr 10 '14 at 13:52
  • 1
    Related to @sashoalm's comment, read this: http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean – Andy Apr 10 '14 at 13:59
  • Note that if your file *only* contains these lines -- that is, there are no lines you want to ignore -- there's no need for regex at all. (Even if there are you could avoid regex, but it's particularly simple if all the lines look like this.) – DSM Apr 10 '14 at 14:08

2 Answers2

2

Try something like this:

result=re.findall(r"([a-z]+)(-?[0-9]*\.?[0-9]+)","N7303 X-6.511 Y-40.862 A-89.926",re.I)
print result

which results in:

[('N', '7303'), ('X', '-6.511'), ('Y', '-40.862'), ('A', '-89.926')]

Notice that -? bit. It makes an optional hyphen possible before the numbers.

Elektito
  • 3,863
  • 8
  • 42
  • 72
1

I would do something along these lines:

contents='''\
N7300 X-7.254 Y-40.839 A-89.74
N7301 X-7.002 Y-40.847 A-89.806
N7302 X-6.75 Y-40.855 A-89.873
N7303 X-6.511 Y-40.862 A-89.926
N7304 X-6.272 Y-40.868 A-89.979'''

import re

pat=re.compile(r'(?:(\w)([-+]?[0-9]*\.?[0-9]+))')

for line in contents.splitlines():
    ld=[ m.groups() for m in pat.finditer(line)]
    print ld

Prints:

[('N', '7300'), ('X', '-7.254'), ('Y', '-40.839'), ('A', '-89.74')]
[('N', '7301'), ('X', '-7.002'), ('Y', '-40.847'), ('A', '-89.806')]
[('N', '7302'), ('X', '-6.75'), ('Y', '-40.855'), ('A', '-89.873')]
[('N', '7303'), ('X', '-6.511'), ('Y', '-40.862'), ('A', '-89.926')]
[('N', '7304'), ('X', '-6.272'), ('Y', '-40.868'), ('A', '-89.979')]
dawg
  • 98,345
  • 23
  • 131
  • 206