How to loop through directory with HTML files and identify lines before and after a string then print to CSV?

Question

I would like to achieve the following in Python: I want to be able to extract the 10 lines before after the word "apple" from a directory (with subdirectories) full of HTML files. I want to print out the lines into a CSV file. Ideally, the CSV file will contain two variables: 1) the HTML filename and 2) the 10 lines before and after the word "apple".

UPDATE: I was able to extract.

import collections
import itertools
import sys
import csv
import glob

for filepath in glob.glob('**/*.html', recursive=True):
    with open(filepath) as f:
        before = collections.deque(maxlen=10)
        for line in f:
            if 'peer' in line:
                sys.stdout.writelines(before)
                sys.stdout.write(line)
                sys.stdout.writelines(itertools.islice(f, 10))
                break
            results=before.append(line)
            print(results)

I will look into the CSV step, but any help will be appreciated

What is the question? What part of your solution are you trying to fix? This isn't a discussion forum, please take the time to read [ask] and the other links found on that page. — wwii, Aug 05 '19 at 02:56
`open('names.csv', 'wb') as f` - you opened a file for writing then you tried to read from it - `for row in f:`. That's why it throws an error. — wwii, Aug 05 '19 at 02:59
I have edited the question according to the How to Ask page and I hope I am much clearer now. Thank you. — hy9fesh, Aug 05 '19 at 05:42
Possible duplicate: [How can I iterate over files in a given directory?](https://stackoverflow.com/questions/10377998/how-can-i-iterate-over-files-in-a-given-directory) — wwii, Aug 05 '19 at 14:14
I have edited the post. I will post the CSV portion once I figure it out. — hy9fesh, Aug 05 '19 at 18:07

How to loop through directory with HTML files and identify lines before and after a string then print to CSV?

0 Answers0