0

So I'm encountering a strange encoding error in Python3.5, I'm reading a string consisting html-data, and I'm handling the string like this :

def parseHtml(self,url):
        r  = requests.get(self.makeUrl())
        data = r.text.encode('utf-8').decode('ascii', 'ignore')
        self.soup = BeautifulSoup(data,'lxml')

The error happens when I'm trying to print the following:

def extractTable(self):
        table = self.soup.findAll("table", { "class" : "messageTable" })
        print(table)

I have checked my locale, and tried various variations of encode / decode as stated in previous similar posts on SO. The strangest thing (for me) is that the script works flawlessly on a different machine and on my laptop. But on my Windows Machine (using cygwin to a remote server) and on my Ubuntu install it simply wont run and gives me:

UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 1273: ordinal not in range(128)
chriskvik
  • 1,281
  • 2
  • 10
  • 32

1 Answers1

0

Okay, so I moved the file from the remote server to my local-machine and it executed perfectly. I then checked my sys.stdout.encoding :

>>> import sys
>>> sys.stdout.encoding
'ANSI_X3.4-1968'

Clearly something was wrong, so I ended up exporting :

export PYTHONIOENCODING=utf-8

And voìla!

chriskvik
  • 1,281
  • 2
  • 10
  • 32