The following approach would read your file in, and give you a list of non border lines:
from itertools import groupby
with open('input.txt') as f_input:
for k, g in groupby(f_input, lambda x: not x.startswith('-BORDER-')):
if k:
print([line.strip() for line in g])
So if your input file was:
-BORDER-
text
-BORDER-
text
-BORDER-
this is some text
with words
on different lines
-BORDER-
It would display the following output:
['text']
['text']
['this is some text', 'with words', 'on different lines']
This works by reading your file in line by line, and using Python's groupby
function to group lines matching a given test. In this case the test is whether or not the line starts -BORDER-
. It returns all following lines which return the same result. The k
is the test result, and the g
is the group of matching lines. So if the test result is True
, it means it did not start with -BORDER-
.
Next, as each of your lines has a newline, a list comprehension is used to strip this from each of the returned lines.
If you wanted to count the words (assuming they are delimited by spaces) then you could do the following:
from itertools import groupby
with open('input.txt') as f_input:
for k, g in groupby(f_input, lambda x: not x.startswith('-BORDER-')):
if k:
lines = list(g)
word_count = sum(len(line.split()) for line in lines)
print("{} words in {}".format(word_count, lines))
Giving you:
1 words in ['text\n']
1 words in ['text\n']
9 words in ['this is some text\n', 'with words \n', 'on different lines\n']