import pandas as pd
import glob
import csv
import re
from bs4 import BeautifulSoup
links_with_text = []
textfile = open("a_file.txt", "w")
for filename in glob.iglob('*.html'):
with open(filename) as f:
soup = BeautifulSoup(f)
links_with_text = [a['href'] for a in soup.find_all('a', href=True) if a.text]
print(links_with_text)
for element in links_with_text:
textfile.write(element + "\n")
sample Output:
file name:
- link1
- link2
- link3
file name2:
- link1
- link2
- link3
file name3:
- link1
- link2
- link3
I found a post some what related to mine but there it prints the output in multiple text files but here I would like to have those file names with their links in one textfile.
BeautifulSoup on multiple .html files
Please suggest. Thank you in advance