2

I'm using Beautiful Soup 4 to extract text from HTML files, and using get_text() I can easily extract just the text, but now I'm attempting to write that text to a plain text file, and when I do, I get the message "416." Here's the code I'm using:

from bs4 import BeautifulSoup
markup = open("example1.html")
soup = BeautifulSoup(markup)
f = open("example.txt", "w")
f.write(soup.get_text())

And the output to the console is 416 but nothing gets written to the text file. Where have I gone wrong?

  • 1
    you need to close the file – mechanical_meat Apr 26 '13 at 16:51
  • alternatively you can use, in 2.5+, the `with` statement to have that handled for you – mechanical_meat Apr 26 '13 at 16:52
  • Have you tried inspecting `soup` and `soup.get_text()`? – Colonel Panic Apr 26 '13 at 17:04
  • right, I wasn't closing the file - rookie mistake –  Apr 26 '13 at 17:05
  • 1
    416 can be the returned value from `f.write()` (the number of bytes written). The writes are buffered by default; flush (application) buffers (`f.flush()`) or close the file (`f.close()` or use `with`-statement that does it for you) to be able to see something in the file outside the Python process. Note: it doesn't ensure that the data is actually saved (physically) to disk depending on your OS, filesystem, hdd it may take a while (usually it doesn't matter unless there is a power failure). `os.fsync()` might flush OS buffers ([usage example](http://stackoverflow.com/a/12012813/4279)). – jfs Apr 26 '13 at 17:49

1 Answers1

5

You need to send text to the BeautifulSoup class. Maybe try markup.read()

from bs4 import BeautifulSoup
markup = open("example1.html")
soup = BeautifulSoup(markup.read())
markup.close()
f = open("example.txt", "w")
f.write(soup.get_text())
f.close()

and in a more pythonic style

from bs4 import BeautifulSoup

with open("example1.html") as markup:
    soup = BeautifulSoup(markup.read())

with open("example.txt", "w") as f: 
    f.write(soup.get_text())

as @bernie suggested

danodonovan
  • 19,636
  • 10
  • 70
  • 78