0

I have the following script to search for a list of headers in a master file.

example of header file:

header1
header2
header3

And a master file which contains the headers and lots of other data in different formats.

master file extract:

header1
line1
line2
line3
line4

header2
line1
line2

header3
line1

When I find the header I am looking to report each line that comes after it until the next blank line. In the format such as in the master file extract above.

At present I can find the header in the master file with the script below but I am unable to report each line that follows.

All attempts have been unsuccessful so far and I would be grateful to know if what I am attempting is possible.

list_file = open("header.txt")
search_words = []
for word in list_file:
    search_words.append(word.strip())
list_file.close()

matches = []

master_file = open("master_file.txt")

for line in master_file:
    current_line = line.split()

    for search_word in search_words:
        if search_word in current_line:
            matches.append(line)
            break
sheaph
  • 199
  • 1
  • 2
  • 10

1 Answers1

1

Would this work? This extract all the content of the lines inside the headers which are listed in header.txt

list_file = open("header.txt")
search_words = [word.strip() for word in list_file]
list_file.close()

matches = []

master_file = open("master_file.txt")


store = False 
for line in master_file:
    if not store and line.strip() and any(line.strip() in s for s in search_words):
        store = True
    if not line.strip():
        print(line)
        store = False

    if store:
           matches.append(line)

This however assumes that the structure of your file follows what you posted. Exceptions such as missing blank line or the header word being contained in one of the lines are not handled.

Simone Zandara
  • 9,401
  • 2
  • 19
  • 26