1

I am trying to read some data from a flat file and displaying it on other application using Python. My flat file has 12,000 lines, and I do not need all of the data. I need to parse out some data. What I have on my flat file is 12,000 lines. A chunk of lines have 00 besides other data, and another chunk has 10 besides other data in the line. What I want to do is parse out all the lines with 10 in it, and include only those lines which have 00.

Below is the updated sample file. I want to parse out all the lines with 10. Also its just a sample, my actual flat file is of 12,000 lines.

I just updated my flat file. Here, I just want to read lines with $ at the start and LOB after $ and 00 at the end before &. I want to parse out everything else in the flat file.

$90TM020516 19002200&
$90LOB  0   0   0   7 10  &
$90LOB 25   0   0   6 10  &
$90LOB 57   0   0   6 10  &
$90LOB353   0   0   5 10  &
$90LOB 36   0   0   5 10  &
$90GPSA8   0   38281168  -77448376&
$90LOB276   0   0   5 10  &
$90LOB185   0   0   6 10  &
$90LOB197   0   0   6 00  &
$90LOB198   0 254   6 00  &
$90LOB197   0 254   6 00  &
RSSI $90LOB201   0 254   5 00  &
$90TM020516 19002300&
$90LOB194   0 254   5 00  &
$90LOB190   0 254   5 00  &
$90LOB185   0 254   5 00  &
$90LOB181   0 254   5 00  &
$90LOB187   0 254   5 00  &
$90LOB192   0 254   5 00  &
$90LOB195   0 254   5 00  &
$90LOB195   0 254   5 00  &
$90LOB191   0 254   5 00  &
$90LOB184   0 254   5 00  &
$90LOB177   0 254   5 00  &

Below is the code I am using for reading data

  for line in lines:
        if (line[0] == '$'):
         if (line[3:6] == 'LOB'):
            if (line[22:24]=='00'):

I can send you the whole flat file if you want to. Its just an extract from the file.

Muscles
  • 41
  • 9
  • if you want code review, you can do that at http://codereview.stackexchange.com/, otherwise please state your problem. – muratgu Jul 18 '16 at 15:56
  • i like your name for starters. So, *Muscles* please post a sample of the file you are trying to parse containing an example of a line you want to keep and one of a line you want to neglect. Secondly, posting the code you wrote for this task would be nice. – Ma0 Jul 18 '16 at 15:58

2 Answers2

1

If I understand your question correctly (and I'm not sure that I do) you have a file with lines that look like this:

@45   0 0   5 10  *
@45   0 0   5 10  *
@45   0 0   5 10  *
@45   0 0   6 10  *
@45   0 0   6 00  *
@45   0 0   6 00  *
@45   0 0   6 00  *
@45   0 0   5 00  *

... and you only want to read the lines that have a 00 and ignore the ones that have a 10.

Here is a sample of code that would accomplish this:

# List containing all the lines you want to save
lines_you_want = []

# Open the file with 12,000 lines
with open('some.file', 'rb') as infile:

    # Check if each line starts with 00
    for line in infile:

        #  Check if the 15th character is a '0' instead of a '1'
        if (line[15] == '0'):
            lines_you_want.append(line)

# Do something with lines_you_want

This assumes that the 00 or 10 is always in the same position in the file (characters 15 and 16) and that these two are the only thing that could be there (i.e. not 01, 11, 12, 29 or whatever) , otherwise you will have to change this.

You could alternatively do something with the lines as you go instead of creating a list, depending on your application. Both ways work.

If I have made a wrong assumption please comment and I will edit my answer.

ClydeTheGhost
  • 1,473
  • 2
  • 17
  • 31
1
import re
filename = <path to file>
lines = [line.strip() for line in open(filename) if re.match(r'^\$.*LOB.*00  &$', line)]

A regex101 example

Regex explained:

The ^ indicates the start of a line. The literal value $ come immediately after the start of the line. any amount of characters can come after, until the parser comes to LOB. The same happens again for 00. If those strings aren't there, then it won't return true for the regex of that line.

So the end result is $ at the start and LOB after $ and00at the end before&`. It will parse out everything else in the file.

It is stored as a list of strings, each string representing a line.

Bonus: If you are outputting this to another file, then you can do this:

import re
with open("FOO", 'w') as outfile, open('BAR', 'r') as infile:
    for line in infile:
        if re.match(r'^\$.*LOB.*00  &$', line):
            outfile.write(line)

This yields:

$90LOB197   0   0   6 00  &
$90LOB198   0 254   6 00  &
$90LOB197   0 254   6 00  &
$90LOB194   0 254   5 00  &
$90LOB190   0 254   5 00  &
$90LOB185   0 254   5 00  &
$90LOB181   0 254   5 00  &
$90LOB187   0 254   5 00  &
$90LOB192   0 254   5 00  &
$90LOB195   0 254   5 00  &
$90LOB195   0 254   5 00  &
$90LOB191   0 254   5 00  &
$90LOB184   0 254   5 00  &
$90LOB177   0 254   5 00  &

From your sample data.

Bryce Drew
  • 5,777
  • 1
  • 15
  • 27