I am writing a python script in which a loop will run and look for specific html pages with the string '_CriteriaOutput.html' in the name over multiple directories. Each directory contains multiple html files and 4-5 html files with the string mentioned above. What I want to do is to read these html files with '_CriteriaOutput.html' name and consolidate it into a different html file. I'll give my code below (whatever i have done so far). This code reads the source code of the html files which is useless for me. I want only text (if any present in the html file)
import os
import fileinput
NightlyLogs = r'C:/Users/<user>/Desktop/Nightly_Logs/2015_07_16-0940'
dir = [fol for fol in os.listdir(NightlyLogs) if os.path.isdir(os.path.join(NightlyLogs, fol))]
dir = sorted(dir)
for folder in dir:
HtmlLoc = r'%s/%s' %(NightlyLogs, folder)
abc = [file for file in os.listdir(HtmlLoc) if file.endswith('_CriteriaOutput.html')]
for one in abc:
HtmlFile = r'%s/%s' %(HtmlLoc, one)
open_file = open(HtmlFile, 'r')
print open_file.read()
NightlyLogs is a location which contains folders with CL (changelist) names (e.g 876564 or 865664 etc). Each HTML file e.g A_CriteriaOutput.html or B_CriteriaOutput.html name contains information for a specific series (let say A or B or C etc.) and each folders with a specific CL name contains similar _CriteriaOutput.html files which contains information only for that CL. I want to make a Table with CL as column and A, B, C, D, E as row which will contain the info for that particular series. I have tried to be specific but in case you think some information is missing please help me learn. I'll try to provide as much info as i can. Thanks.