Although I studied the earlier questions (How to rename a file using Python), It's for me still not clear to rename all my HTML in folder x, based on the H1 of the div in my HTML file.
<div id="page_header" class="page_header_email_alerts">
<h1>
<span itemprop="headline">Redhill Biopharma's (RDHL) CEO Dror Ben Asher on Q4 2014 Results - Earnings Call Transcript</span>
</h1>
</div>
Does someone have a suggestion? I have made with bs4 a solution, but it does not loop through all my htmls:
import os
from bs4 import BeautifulSoup
import textwrap
directory ='C:/Research syntheses - Meta analysis/SeekingAlpha/test/'
for filename in os.listdir(directory):
if filename.endswith('.html'):
fname = os.path.join(directory,filename)
with open(fname, 'r') as f:
soup = BeautifulSoup(f.read(),'html.parser')
headline = soup.find(itemprop='headline').text
os.rename(filename, headline+'.html')