python 3 not recognizing vertical bar character

Question

I had the following code but python 3 is not recognizing the vertical pipe as a unicode character.

    m_cols = ['movie_id', 'title', 'release_date', 
        'video_release_date', 'imdb_url']

    movies = pd.read_csv(
        'http://files.grouplens.org/datasets/movielens/ml-100k/u.item', 
         sep='|', names=m_cols, usecols=range(5))

    movies.head()

and I get the following error

    UnicodeDecodeError                        Traceback (most recent call 
    last)
    pandas\_libs\parsers.pyx in 
    pandas._libs.parsers.TextReader._convert_tokens 
    (pandas\_libs\parsers.c:14858)()

    pandas\_libs\parsers.pyx in 
    pandas._libs.parsers.TextReader._convert_with_dtype 
    (pandas\_libs\parsers.c:17119)()

    pandas\_libs\parsers.pyx in 
    pandas._libs.parsers.TextReader._string_convert 
    (pandas\_libs\parsers.c:17347)()

    pandas\_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8 
    (pandas\_libs\parsers.c:23041)()

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: 
    invalid continuation byte

    During handling of the above exception, another exception occurred:

    UnicodeDecodeError                        Traceback (most recent call 
    last)
    <ipython-input-15-72a8222212c1> in <module>()
    4 movies = pd.read_csv(
    5     'http://files.grouplens.org/datasets/movielens/ml-100k/u.item',
    ----> 6     sep='|', names=m_cols, usecols=range(5))
    7 
    8 movies.head()

What could be the possible reason behind this, and how can I fix this?

Possibly related https://stackoverflow.com/questions/28947607/ascii-codec-cant-decode-byte-0xe9 — svgrafov, Nov 08 '17 at 13:07

Mohamed Ali JAMAOUI · Accepted Answer · 2017-11-08T16:42:55.737

In python3, use encoding="latin-1":

In [9]: movies = pd.read_csv(
        'http://files.grouplens.org/datasets/movielens/ml-100k/u.item', 
         sep='|', names=m_cols, usecols=range(5),  header=None, encoding="latin-1")

In [10]: movies.head()
Out[10]: 
   movie_id              title release_date  video_release_date  \
0         1   Toy Story (1995)  01-Jan-1995                 NaN   
1         2   GoldenEye (1995)  01-Jan-1995                 NaN   
2         3  Four Rooms (1995)  01-Jan-1995                 NaN   
3         4  Get Shorty (1995)  01-Jan-1995                 NaN   
4         5     Copycat (1995)  01-Jan-1995                 NaN   

                                            imdb_url  
0  http://us.imdb.com/M/title-exact?Toy%20Story%2...  
1  http://us.imdb.com/M/title-exact?GoldenEye%20(...  
2  http://us.imdb.com/M/title-exact?Four%20Rooms%...  
3  http://us.imdb.com/M/title-exact?Get%20Shorty%...  
4  http://us.imdb.com/M/title-exact?Copycat%20(1995)

python 3 not recognizing vertical bar character

1 Answers1