0
import urllib2
import html2text

print 'Please input URL with the text you want to analyze: '
url = raw_input()
page = urllib2.urlopen(url)
html_content = page.read()
print html_content
rendered_content = html2text.html2text(html_content)
f = open('file_text.txt', 'w')
f.write(rendered_content)
f.close()

Was trying to save the file to my computer. I know there is a problem with the ASCII

  • 1
    On which line do you get the exception? Also, which python version do you use? – omri_saadon Jun 07 '17 at 16:09
  • Don't forget to share what you are inputing in the `raw_input` call. – Bonifacio2 Jun 07 '17 at 16:17
  • Possible duplicate of [UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)](https://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20) – Mani Jun 07 '17 at 16:38
  • @omri_saadon I'm using version 2.7.12 – Valerian V. Jun 07 '17 at 17:10
  • @Bonifacio2 I was just using a random URL, like https://www.w3schools.com/html/tryit.asp?filename=tryhtml_basic_document – Valerian V. Jun 07 '17 at 17:11

1 Answers1

0

Change from

html_content = page.read()

to

html_content = page.read().decode(errors='replace')

str.decode

stovfl
  • 14,998
  • 7
  • 24
  • 51