1

I have a ploblem when i test data mining from twitter by i search data by word.

It error UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) from this code

retweet = "-filter:retweets"
query = "#Thailand" + retweet 

df = pd.DataFrame(columns = ["create_at","user","location","text", "retweet_count", "favourite_count","hashtag","follower","source"])
for tweet in tweepy.Cursor(api.search, q = query,result_type="recent", tweet_mode='extended').items(100):
     
    entity_hashtag = tweet.entities.get('hashtags')
    hashtag = ""
    for i in range(0, len(entity_hashtag)):
        hashtag = hashtag + "/" + entity_hashtag[i]["text"]
    re_count = tweet.retweet_count
    create_at = tweet.created_at
    user = tweet.user.screen_name
    source = tweet.source
    location = tweet.user.location
    follower = tweet.user.followers_count

    try:
        text = tweet.retweeted_status.full_text
        fav_count = tweet.retweeted_status.favorite_count 

    except:     
        text = tweet.full_text
        fav_count = tweet.favorite_count  
    new_column = pd.Series([create_at,user,location,text, re_count, fav_count,hashtag,follower,source], index = df.columns)
    df = df.append(new_column, ignore_index = True)

df.to_csv(date_time+".csv")

Why have this ploblem ?

zealous
  • 7,336
  • 4
  • 16
  • 36
Arifeen Kundee
  • 45
  • 1
  • 2
  • 8
  • 1
    always put full error message (starting at word "Traceback") in question (not comment) as text (not screenshot). There are other useful information. – furas Apr 14 '20 at 03:52
  • which lines makes proble ? Add it in question (not in comment) – furas Apr 14 '20 at 03:53
  • usually problem is that text has some native chars but system try to convert it to `ascii` instead of `utf-8`, `latin1` or `cp1250` and you have to manually add this options (ie. `encode="utf-8"`) if it possible to function which makes problem. – furas Apr 14 '20 at 03:56
  • I think ploblem this df.to_csv(date_time+".csv") – Arifeen Kundee Apr 14 '20 at 03:56
  • better show full error message – furas Apr 14 '20 at 03:57
  • if you think problem is `to_csv()` then find documentation for `to_csv()` and check if it has option for setting `utf-8` or `latin1` or `cp1250` – furas Apr 14 '20 at 03:58
  • File "wu.py", line 46, in df.to_csv(date_time+".csv") File "/usr/lib64/python2.7/site-packages/pandas/core/generic.py", line 3020, in to_csv formatter.save() File "/usr/lib64/python2.7/site-packages/pandas/io/formats/csvs.py", line 172, in save self._save() File "/usr/lib64/python2.7/site-packages/pandas/io/formats/csvs.py", line 288, in _save – Arifeen Kundee Apr 14 '20 at 04:00
  • did you read previous comments - always put error message in QUESTION, not in comment. It will be more readable and more people will see it. – furas Apr 14 '20 at 04:01
  • error shows you problem with `to_csv()` so now find documentation for `to_csv()` to see all available options. It should have code `encode` or `encoding` or something similar. – furas Apr 14 '20 at 04:03

2 Answers2

2

Try setting the system default encoding as utf-8 at the start of your scipt, the following should set the default encoding as utf-8 .

import sys
reload(sys)
sys.setdefaultencoding('utf-8')
zealous
  • 7,336
  • 4
  • 16
  • 36
0

You don't mention which version of Python you are using but I would look in Python's documentation on the subject here: https://www.python.org/dev/peps/pep-0263/ (for Python 2)

From there:

To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file, such as:

# coding=<encoding name>

or (using formats recognized by popular editors):

#!/usr/bin/python
# -*- coding: <encoding name> -*-

or:

#!/usr/bin/python
# vim: set fileencoding=<encoding name> :

I have used this version in certain cases:

#!/usr/bin/python
# -*- coding: <encoding name> -*-

That said, some functions, and especially str() should not be used with unicode. Prefer unicode() instead. When using third party libraries you will have to check their documentation, and possibly look at their source if their docs are limited.

devlpr
  • 56
  • 3
  • Also, take a look at this answer https://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script which discourages the use of setdefaultencoding. – devlpr Apr 16 '20 at 18:19
  • You probably just need to export PYTHONIOENCODING="UTF-8" in your shell. – devlpr Apr 16 '20 at 18:22