0

This is my third day with Python and I am sure that something simple is being overlooked.

I am trying to index into a list of html file names, setting the indexed html file name into a var, and then trying to open that file. The plan is to loop through the list of file names.

Unfortunately, the var is not being read as a file but is being read as a name.

I thought this would be an easy question to answer but I am just not finding it.

So, what am I doing wrong? Any help will be highly appreciated.

Here is my code:

file_list = []
   for root, dirs, files in os.walk(r'C:\Aptana\Beautiful'):
     for file in files:
       if file.endswith('.html'):
          file_list.append(file)
input_file = file_list[0]
orig_file = open(input_file, 'w')

I know that I am missing something simple but I it's driving me nuts!

Update:

file_list = []
for root, dirs, files in os.walk(r'C:\Aptana\Beautiful'):
 for file in files:
   if file.endswith('.html'):
      file_list.append(os.path.join(root,file))
     input_file = file_list[0]
     orig_file = open(input_file, 'w')
     soup = BeautifulSoup(orig_file)
     title = soup.find('title')      
     main_txt = soup.findAll(id='main')[0]
     toc_txt = soup.findAll(class_ ='toc-indentation')[0]

And then the crash:

Traceback (most recent call last):
  File "C:\Aptana\beautiful\B-1.py", line 47, in <module>
   soup = BeautifulSoup(orig_file)
 File "C:\Python33\lib\site-packages\bs4\__init__.py", line 161, in __init__
   markup = markup.read()
 io.UnsupportedOperation: not readable

Thanks adsmith! Please let me know if you have any other questions.

orig_file is being printed as: <_io.TextIOWrapper name='C:\Aptana\Beautiful mode='r' \Administration+Guide.html' encoding='cp1252'>

veblen
  • 73
  • 9
  • 1
    This code looks correct at a glance. What do you mean by "not being read as a file but is being read as a name"? What is the program's behavior, and what did you expect it to do instead? – Tim Pierce Dec 11 '13 at 22:30

2 Answers2

1

Looks to me like your current working directory is not in the same directory as you're walking to. Try doing this instead:

file_list = []
   for root, dirs, files in os.walk(r'C:\Aptana\Beautiful'):
     for file in files:
       if file.endswith('.html'):
          file_list.append(os.path.join(root,file))
input_file = file_list[0]
orig_file = open(input_file, 'w')

also I strongly recommend using the "with" contextlib rather than using orig_file = open(file) and orig_file.close(). Instead implement as follows:

#walk through your directory as you're doing already
input_file = file_list[0] #you know this is only for the first file, right?
with open(input_file,'w') as orig_file:
  #do stuff to the file
#once you're out of the block, the file automagically closes, which catches
#all kinds of accidental breaks in cases of error or exception.

Looks like your issue is that you're opening the file with the "write" flag instead of the "read" flag. I don't actually know what BeautifulSoup does, but a quick google makes it look like a screen parser. Open the orig_file as 'r' instead of 'w'.

orig_file = open(input_file,'r') #your way
#or the better way ;)
with open(input_file,'r') as orig_file:
  #do stuff to it in the block

That's better anyway, since opening a file as 'w' blanks the file :)

Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • First of all, Thanks adsmith! – veblen Dec 11 '13 at 22:53
  • I tried your code and all seemed to work until the next section of code where I use Beautiful Soup and it breaks. Here is the returned code: <_io.TextIOWrapper name='C:\\Aptana\\Beautiful\\Administration+Guide.html' mode='w' encoding='cp1252'> Traceback (most recent call last): File "C:\Aptana\beautiful\B-1.py", line 47, in soup = BeautifulSoup(orig_file) File "C:\Python33\lib\site-packages\bs4\__init__.py", line 161, in __init__ markup = markup.read() io.UnsupportedOperation: not readable Any Ideas? – veblen Dec 11 '13 at 22:54
  • Show me the code and we'll find out why :). Sounds like you're probably trying to use file_list as both a list of the file names and a list of the file paths. Please edit your question with the code it's now failing on. – Adam Smith Dec 11 '13 at 22:56
  • 1
    You're opening the file to be written, not read. try `open(input_file,'r')` if you need to read it instead of write to it. – Adam Smith Dec 11 '13 at 22:58
  • I tried the 'r' which caused index issues with the [0], along with returning nothing from soup. – veblen Dec 11 '13 at 23:12
  • Since I don't know BeautifulSoup at all, it's hard for me to help you with that. It looks like you're trying to open the file for reading, then using some HTML scraper (BeautifulSoup) to pull contents from it. You DEFINITELY must use `open(filename,'r')` not `open(filename,'w')` to do that. I'm also betting that you should `orig_file.close()` after you `soup = BeautifulSoup(orig_file)`. Past that I'll need to see tracebacks. – Adam Smith Dec 11 '13 at 23:17
0

I believe a similar question can be found here: How to read file attributes in directory?

The answer, possibly, has the information you're seeking (using os.stat or os.path to provide the actual path to the file.)

Community
  • 1
  • 1
Omnivore
  • 1
  • 5
  • Thanks Omnivore! I did not see that and I did some searching. I will try to do better going forward. – veblen Dec 11 '13 at 22:57