I'm trying to extract information between fixed tags with BeautifulSoup by using the model suggested here enter link description here
I have a lot of .html
files in my folder and I want to save results obtained with a BeautifulSoup script into another folder in the form of individual .txt
files. These .txt
files should have the same name as original files but would contain only extracted content. The script I wrote (see below) processes files successfully but does not write extracted bits out to individual files.
import os
import glob
from bs4 import BeautifulSoup
dir_path = "C:My_folder\\tmp\\"
for file_name in glob.glob(os.path.join(dir_path, "*.html")):
my_data = (file_name)
soup = BeautifulSoup(open(my_data, "r").read())
for i in soup.select('font[color="#FF0000"]'):
print(i.text)
file_path = os.path.join(dir_path, file_name)
text = open(file_path, mode='r').read()
results = i.text
results_dir = "C:\\My_folder\\tmp\\working"
results_file = file_name[:-4] + 'txt'
file_path = os.path.join(results_dir, results_file)
open(file_path, mode='w', encoding='UTF-8').write(results)