The following code works fine when run from Jupyter IPython notebook:
from bs4 import BeautifulSoup
xml_file_path = "<Path to XML file>"
s = BeautifulSoup(open(xml_file_path), "xml")
But it fails when creating the soup when run from Eclipse/PyDev (which uses the same Python interpreter):
Traceback (most recent call last):
File "~/parser/scratch.py", line 3, in <module>
s = BeautifulSoup(open(xml_file), "xml")
File "/anaconda/lib/python3.5/site-packages/bs4/__init__.py", line 175, in __init__
markup = markup.read()
File "/anaconda/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1812: ordinal not in range(128)
- Python version: 3.5.2 (Anaconda 4.1.1)
- BeautifulSoup: version 4
- IPython Notebook version: 4.2.1
- Eclipse version: Mars.2 Release (4.5.2)
- PyDev version: 5.1.2.20160623256
- Mac OS X: El Capitan 10.11.6
UPDATE:
The character in the file that is causing issue in Eclipse is �
, but this causes no issues in IPython Notebook! If I remove this character from the XML file, then the code works fine in Eclipse as well. Is there some setting in Eclipse I need to change so that the code won't fail on this (and possibly other such) character?