I have tens of thousands of HTML documents saved to my computer, and i need to parse them all using BeautifulSoup, using the same consistent tags between each document.
Currently I iterate through my folder of HTML files, open each file, parse it, then close. But the time it takes to open/parse/close is too long. I tried to save several HTML documents in one text document and "redo" the opening and closing HTML tags, but im not totally sure how parsing works, so i wasnt sure about rearranging the document without messing up the parsing process.
Is there any sort-of standardized method of doing this? If i could combine as many HMTL codes into one text document as possible, i think i would make this portion of the process go much faster.
EDIT: There are only as many as 100 individual 'items' that i am looking for in each html document, so i can only parse as many as 100 at a time. Its not that im trying to parse through my documents any quicker, but instead i want to save as many html documents into one text file as possible, with hopes of being able to parse 1000 items at a time, or many more if possible.