0

I need to parse an output text file that has a lot of information, specifically about weighing and calibrating masses. There is a data table in this text file that has the name of the mass being tested, its nominal weight, density, and other properties of the mass.

Here's a picture of what this part of the text file looks like. I want to have five capture groups, for each column. Right now, I have

tablePattern = r'\[mg\]\s*(.{4,15})\s+(\d*)\s*(\d*)\s*(\d*)\s*(\d*)'
tableMatches = re.findall(tablePattern, text)

However, this gives me matches I don't want, and it doesn't return all the capture groups I want. Any help would be appreciated!

Peter Wang
  • 1,808
  • 1
  • 11
  • 28
  • 1
    regex is kind of overkill, this is a simple split foreach line – Ryan May 31 '16 at 18:01
  • If there are holes in the tabular data, then split will put the wrong values in the wrong column. Slicing is a safer bet. See my answer to this question: http://stackoverflow.com/questions/3911483/python-slice-how-to-i-know-the-python-slice-but-how-can-i-use-built-in-slice-ob – PaulMcG May 31 '16 at 18:07
  • How would I do this? There are probably ten of these tables in the text file, some having more rows, so I would need to get the information for every single one. Should I use regex to find the start of the table, and then read the text file line by line, doing str.split(" ")? – Peter Wang May 31 '16 at 18:14
  • Thanks @PaulMcGuire! – Peter Wang May 31 '16 at 18:15

1 Answers1

0

You will need to loop through your file and process each line of input but this should work. Let me know if it does not, along with some real data.txt. You can add more groups to this and make them optional by placing a + at the end of the group to handle additional columns of data.

import re
p = re.compile('^(.*)\s+(-\d+.\d+|\d+.\d+)\s+(-\d+.\d+|\d+.\d+)\s+(-\d+.\d+|\d+.\d+)\s+(-\d+.\d+|\d+.\d+)$')
m = p.match( 'b 100g 1dot   100.0000    5.63334 0.0000002   -339.3333' )
if m:
    print('Weight Being Tested: ', m.group(1))
    print('Nominal Value: ', m.group(2))
    print('Density: ', m.group(3))
    print('Expansion: ', m.group(4))
    print('Correction: ', m.group(3))
else:
    print('No match')


# Weight Being Tested:  b 100g 1dot
# Nominal Value:  100.0000
# Density:  5.63334
# Expansion:  0.0000002
# Correction:  5.63334
tanuki505
  • 23
  • 3