UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) with python

Question

I have a ploblem when i test data mining from twitter by i search data by word.

It error UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) from this code

retweet = "-filter:retweets"
query = "#Thailand" + retweet 

df = pd.DataFrame(columns = ["create_at","user","location","text", "retweet_count", "favourite_count","hashtag","follower","source"])
for tweet in tweepy.Cursor(api.search, q = query,result_type="recent", tweet_mode='extended').items(100):
     
    entity_hashtag = tweet.entities.get('hashtags')
    hashtag = ""
    for i in range(0, len(entity_hashtag)):
        hashtag = hashtag + "/" + entity_hashtag[i]["text"]
    re_count = tweet.retweet_count
    create_at = tweet.created_at
    user = tweet.user.screen_name
    source = tweet.source
    location = tweet.user.location
    follower = tweet.user.followers_count

    try:
        text = tweet.retweeted_status.full_text
        fav_count = tweet.retweeted_status.favorite_count 

    except:     
        text = tweet.full_text
        fav_count = tweet.favorite_count  
    new_column = pd.Series([create_at,user,location,text, re_count, fav_count,hashtag,follower,source], index = df.columns)
    df = df.append(new_column, ignore_index = True)

df.to_csv(date_time+".csv")

Why have this ploblem ?

always put full error message (starting at word "Traceback") in question (not comment) as text (not screenshot). There are other useful information. — furas, Apr 14 '20 at 03:52
which lines makes proble ? Add it in question (not in comment) — furas, Apr 14 '20 at 03:53
usually problem is that text has some native chars but system try to convert it to `ascii` instead of `utf-8`, `latin1` or `cp1250` and you have to manually add this options (ie. `encode="utf-8"`) if it possible to function which makes problem. — furas, Apr 14 '20 at 03:56
if you think problem is `to_csv()` then find documentation for `to_csv()` and check if it has option for setting `utf-8` or `latin1` or `cp1250` — furas, Apr 14 '20 at 03:58
File "wu.py", line 46, in df.to_csv(date_time+".csv") File "/usr/lib64/python2.7/site-packages/pandas/core/generic.py", line 3020, in to_csv formatter.save() File "/usr/lib64/python2.7/site-packages/pandas/io/formats/csvs.py", line 172, in save self._save() File "/usr/lib64/python2.7/site-packages/pandas/io/formats/csvs.py", line 288, in _save — Arifeen Kundee, Apr 14 '20 at 04:00
did you read previous comments - always put error message in QUESTION, not in comment. It will be more readable and more people will see it. — furas, Apr 14 '20 at 04:01
error shows you problem with `to_csv()` so now find documentation for `to_csv()` to see all available options. It should have code `encode` or `encoding` or something similar. — furas, Apr 14 '20 at 04:03

score 2 · Accepted Answer · answered Apr 14 '20 at 04:04

2

Try setting the system default encoding as utf-8 at the start of your scipt, the following should set the default encoding as utf-8 .

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

answered Apr 14 '20 at 04:04

zealous

7,336
4
16
36

It's work Thank you. – Arifeen Kundee Apr 14 '20 at 04:08
@ArifeenKundee Please accept my answer by clicking on tick mark. Thanks – zealous Apr 14 '20 at 04:09

score 0 · Answer 2 · answered Apr 14 '20 at 04:10

You don't mention which version of Python you are using but I would look in Python's documentation on the subject here: https://www.python.org/dev/peps/pep-0263/ (for Python 2)

From there:

To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file, such as:

# coding=<encoding name>

or (using formats recognized by popular editors):

#!/usr/bin/python
# -*- coding: <encoding name> -*-

or:

#!/usr/bin/python
# vim: set fileencoding=<encoding name> :

I have used this version in certain cases:

#!/usr/bin/python
# -*- coding: <encoding name> -*-

That said, some functions, and especially str() should not be used with unicode. Prefer unicode() instead. When using third party libraries you will have to check their documentation, and possibly look at their source if their docs are limited.

Also, take a look at this answer https://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script which discourages the use of setdefaultencoding. — devlpr, Apr 16 '20 at 18:19
You probably just need to export PYTHONIOENCODING="UTF-8" in your shell. — devlpr, Apr 16 '20 at 18:22

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) with python

2 Answers2