I would like to achieve the following in Python: I want to be able to extract the 10 lines before after the word "apple" from a directory (with subdirectories) full of HTML files. I want to print out the lines into a CSV file. Ideally, the CSV file will contain two variables: 1) the HTML filename and 2) the 10 lines before and after the word "apple".
UPDATE: I was able to extract.
import collections
import itertools
import sys
import csv
import glob
for filepath in glob.glob('**/*.html', recursive=True):
with open(filepath) as f:
before = collections.deque(maxlen=10)
for line in f:
if 'peer' in line:
sys.stdout.writelines(before)
sys.stdout.write(line)
sys.stdout.writelines(itertools.islice(f, 10))
break
results=before.append(line)
print(results)
I will look into the CSV step, but any help will be appreciated