I had the following code but python 3 is not recognizing the vertical pipe as a unicode character.
m_cols = ['movie_id', 'title', 'release_date',
'video_release_date', 'imdb_url']
movies = pd.read_csv(
'http://files.grouplens.org/datasets/movielens/ml-100k/u.item',
sep='|', names=m_cols, usecols=range(5))
movies.head()
and I get the following error
UnicodeDecodeError Traceback (most recent call
last)
pandas\_libs\parsers.pyx in
pandas._libs.parsers.TextReader._convert_tokens
(pandas\_libs\parsers.c:14858)()
pandas\_libs\parsers.pyx in
pandas._libs.parsers.TextReader._convert_with_dtype
(pandas\_libs\parsers.c:17119)()
pandas\_libs\parsers.pyx in
pandas._libs.parsers.TextReader._string_convert
(pandas\_libs\parsers.c:17347)()
pandas\_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8
(pandas\_libs\parsers.c:23041)()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3:
invalid continuation byte
During handling of the above exception, another exception occurred:
UnicodeDecodeError Traceback (most recent call
last)
<ipython-input-15-72a8222212c1> in <module>()
4 movies = pd.read_csv(
5 'http://files.grouplens.org/datasets/movielens/ml-100k/u.item',
----> 6 sep='|', names=m_cols, usecols=range(5))
7
8 movies.head()
What could be the possible reason behind this, and how can I fix this?