I'm trying to get Arabic tweets by using tweepy library in python 3.6, with English it works perfectly but when i try to get Arabic tweets i faced many problemm the problem with this last code is that the tweets in Arabic characters appear as "\u0635\u0648\u0651\u062a\u0648\u0627 "
i tried several solution in the internet but there is no one that solved my problem because most of them try to get just "text" of the tweet so they can fix the encode problem directly with the text only, but for me i want to get the whole info in json
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
access_token = '-'
access_token_secret = '-'
consumer_key = '-'
consumer_secret = '-'
class StdOutListener(StreamListener):
def on_data(self, data):
print (data.encode("UTF-8"))
return True
def on_error(self, status):
print (status)
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter( track=["عربي"])
> $ python file.py > file2.txt
the results in text file and in the terminal:
{"created_at":"Thu Jan 17 12:12:16 +0000 2019","id":1085872428432195585,"id_str":"1085872428432195585","text":"RT @MALHACHIMI: \u0642\u0627\u062f\u0629 \u062d\u0631\u0643\u0629 \u0627\u0644\u0646\u0647\u0636\u0629 \u0635\u0648\u0651\u062a\u0648\u0627 \u0636\u062f \u0627\u0639\u062a\....etc}