I am trying to extract certain texts based on surrounding words/patterns and output the information to a file called sample.csv.
For example, I have a directory of files:
file1.html file2.html file3.html
Each file contains the following structure. For example, file1.html:
<strong>Hello world</strong>
<p><strong>Name:</strong> John Smith</p>
<p>Some text</p>
<p><strong>Location</strong></p>
<blockquote>
<p>122 Main Street & City, ST 12345 ></p>
</blockquote>
<p>Some text</p>
Based on the above HTML structure, I want to output it to a sample.csv file that looks like this:
filename,name,location
file1.html,John Smith,122 Main Street
file2.html,Mary Smith,123 North Road
file3.html,Kate Lee,90 Winter Lane
I have the following python code:
import os
import csv
import re
csv_cont = []
directory = os.getcwd()
for root,dir,files in os.walk(directory):
for file in files:
if file.endswith(".html"):
f = open(file, 'r')
name = re.search('<p><strong>Name:</strong>(.*)</p>', f)
location = re.search('<p><strong>Location</strong></p><blockquote><p>(.*)&', f)
tmp = []
tmp.append(file)
tmp.append(name)
tmp.append(location)
csv_cont.append(tmp)
f.close()
#Change name of test.csv to whatever you want
with open("sample.csv", 'w', newline='') as myfile:
wr = csv.DictWriter(myfile, fieldnames = ["filename", "name", "location"], delimiter = ',')
wr.writeheader()
wr = csv.writer(myfile)
wr.writerows(csv_cont)
I am getting the following error:
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
What is the issue here?