Attached at bottom is a solution using a Python generator split_into_chunks(f)
to extract each section (as list-of-string), squelch empty lines, detect missing @headers and EOF. The generator approach is really neat because it allows you to further wrap e.g. a CSV reader object which handles space-separated value (e.g. pandas read_csv):
with open('your.ssv') as f:
for chunk in split_into_chunks(f):
# Do stuff on chunk. Presumably, wrap a reader e.g. pandas read_csv
# print(chunk)
Code is below. I also parameterized the value demarcator='@header'
for you. Note that we have to iterate with line = inputstream.readline()
, while line
, instead of the usual iterating with for line in f
, since if we see the @header of the next section, we need to pushback with seek/tell()
; see this and this for explanation why. And if you want to modify the generator to yield the chunk header and body separately (e.g. as a list of two items), that's trivial.
def split_into_chunks(inputstream, demarcator='@header'):
"""Utility generator to get sections from file, demarcated by '@header'"""
while True:
chunk = []
line = inputstream.readline()
# At EOF?
if not line: break
# Expect that each chunk starts with one header line
if not line.startswith(demarcator):
raise RuntimeError(f"Bad chunk, missing {demarcator}")
chunk.append(line.rstrip('\n'))
# Can't use `for line in inputstream:` since we may need to pushback
while line:
# Remember our file-pointer position in case we need to pushback a header row
last_pos = inputstream.tell()
line = inputstream.readline()
# Saw next chunk's header line? Pushback the header line, then yield the current chunk
if line.startswith(demarcator):
inputstream.seek(last_pos)
break
# Ignore blank or whitespace-only lines
#line = line.rstrip('\n')
if line:
chunk.append(line.rstrip('\n'))
yield chunk
with open('your.ssv') as f:
for chunk in split_into_chunks(f):
# Do stuff on chunk. Presumably, wrap it with a reader which handles space-sparated value, e.g. pandas read_csv
print(chunk)