This is my third day with Python and I am sure that something simple is being overlooked.
I am trying to index into a list of html file names, setting the indexed html file name into a var, and then trying to open that file. The plan is to loop through the list of file names.
Unfortunately, the var is not being read as a file but is being read as a name.
I thought this would be an easy question to answer but I am just not finding it.
So, what am I doing wrong? Any help will be highly appreciated.
Here is my code:
file_list = []
for root, dirs, files in os.walk(r'C:\Aptana\Beautiful'):
for file in files:
if file.endswith('.html'):
file_list.append(file)
input_file = file_list[0]
orig_file = open(input_file, 'w')
I know that I am missing something simple but I it's driving me nuts!
Update:
file_list = []
for root, dirs, files in os.walk(r'C:\Aptana\Beautiful'):
for file in files:
if file.endswith('.html'):
file_list.append(os.path.join(root,file))
input_file = file_list[0]
orig_file = open(input_file, 'w')
soup = BeautifulSoup(orig_file)
title = soup.find('title')
main_txt = soup.findAll(id='main')[0]
toc_txt = soup.findAll(class_ ='toc-indentation')[0]
And then the crash:
Traceback (most recent call last):
File "C:\Aptana\beautiful\B-1.py", line 47, in <module>
soup = BeautifulSoup(orig_file)
File "C:\Python33\lib\site-packages\bs4\__init__.py", line 161, in __init__
markup = markup.read()
io.UnsupportedOperation: not readable
Thanks adsmith! Please let me know if you have any other questions.
orig_file is being printed as: <_io.TextIOWrapper name='C:\Aptana\Beautiful mode='r' \Administration+Guide.html' encoding='cp1252'>