Extract lines between headings repeated through file

Question

I am trying to modify a txt file with ~43k lines. After the command *Nset is given in the file, I need to extract and save all of the lines following that command, stopping when it gets to the next *command in the file. There is a different number of lines and characters after each of the commands. For instance, here's a sample part of the file:

*Nset

1, 2, 3, 4, 5, 6, 7,

12, 13, 14, 15, 16,

17, 52, 75, 86, 92,

90, 91, 92 93, 94, 95....

*NEXT COMMAND

 blah blah blah

*Nset

 numbers

*Nset

 numbers

*Command

 irrelevant text

The code I currently have works when the numbers I need are not in between two *Nset's. When one *Nset follows another's numbers, it skips that command and the proceeding lines all together and I can't figure out why. When the next command is not *Nset, it finds the next one and pulls out the data perfectly fine.

import re

# read in the input deck
deck_name = 'master.txt'
deck = open(deck_name,'r')

#initialize variables
nset_data = []
matched_nset_lines = []
nset_count = 0

for line in deck:
     # loop to extract all nset names and node numbers
     important_line = re.search(r'\*Nset,.*',line)
     if important_line :
         line_value = important_line.group() #name for nset
         matched_nset_lines.insert(nset_count,line_value) #name for nset
         temp = []

        # read lines from the found match up until the next *command
         for line_x in deck :
             if not re.match(r'\*',line_x):
                 temp.append(line_x)
             else : 
                 break

         nset_data.append(temp)

     nset_count = nset_count + 1

I'm using Python 3.5. Thanks for any help.

Is some command *always* at the beginning of a line, starting with a `"*"`? — juanpa.arrivillaga, Jul 05 '17 at 19:29
@juanpa.arrivillaga, Yes. There are a variety of commands, but immediately before each one is " * ". And then the next line(s) are numbers. — K. Gibboney, Jul 05 '17 at 19:36
Could this be at all related? https://stackoverflow.com/questions/25943000/finding-a-word-between-two-words-that-will-not-match-if-the-closing-word-occurs — kayleeFrye_onDeck, Jul 05 '17 at 19:39

juanpa.arrivillaga · Accepted Answer · 2017-07-05T19:42:46.617

If you just want to extract the lines between *Nsets the following approach should work:

In [5]: with open("master.txt") as f:
   ...:     data = []
   ...:     gather = False
   ...:     for line in f:
   ...:         line = line.strip()
   ...:         if line.startswith("*Nset"):
   ...:             gather = True
   ...:         elif line.startswith("*"):
   ...:             gather = False
   ...:         elif line and gather:
   ...:             data.append(line)
   ...:

In [6]: data
Out[6]:
['1, 2, 3, 4, 5, 6, 7,',
 '12, 13, 14, 15, 16,',
 '17, 52, 75, 86, 92,',
 '90, 91, 92 93, 94, 95....',
 'numbers',
 'numbers']

And, if you want the additional information, it is simple enough to extend the above:

In [7]: with open("master.txt") as f:
   ...:     nset_lines = []
   ...:     nset_count = 0
   ...:     data = []
   ...:     gather = False
   ...:     for i, line in enumerate(f):
   ...:         line = line.strip()
   ...:         if line.startswith("*Nset"):
   ...:             gather = True
   ...:             nset_lines.append(i)
   ...:             nset_count += 1
   ...:         elif line.startswith("*"):
   ...:             gather = False
   ...:         elif line and gather:
   ...:             data.append(line)
   ...:

In [8]: nset_lines
Out[8]: [0, 14, 18]

In [9]: nset_count
Out[9]: 3

In [10]: data
Out[10]:
['1, 2, 3, 4, 5, 6, 7,',
 '12, 13, 14, 15, 16,',
 '17, 52, 75, 86, 92,',
 '90, 91, 92 93, 94, 95....',
 'numbers',
 'numbers']

Extract lines between headings repeated through file

1 Answers1