Unable to Save Arabic Decoded Unicode to CSV File Using Python

Question

I am working with a twitter streaming package for python. I am currently using a keyword that is written in unicode to search for tweets containing that word. I am then using python to create a database csv file of the tweets. However, I want to convert the tweets back to Arabic symbols when I save them in the csv.

The errors I am receiving are all similar to "error ondata the ASCII caracters in position ___ are not within the range of 128."

Here is my code:

class listener(StreamListener):
    def on_data(self, data):
        try:
            #print data

            tweet = (str((data.split(',"text":"')[1].split('","source')[0]))).encode('utf-8')
            now = datetime.now()
            tweetsymbols =  tweet.encode('utf-8')
            print tweetsymbols

            saveThis = str(now) + ':::' + tweetsymbols.decode('utf-8')
            saveFile = open('rawtwitterdata.csv','a')
            saveFile.write(saveThis)
            saveFile.write('\n')
            saveFile.close()
            return True

I used the .encode('utf-8') as it was used in another question similar to mine, but it did not work. — Joseph P Nardone, Jan 20 '16 at 16:30
Specfiy the encoging while open the file. For python2 use codecs module if py3 directly with open function. — Ali SAID OMAR, Jan 20 '16 at 16:39
Could you demonstrate this with code? I am not understanding what you are suggesting. I want to be able to open the file and see the arabic symbols instead of the unicode. — Joseph P Nardone, Jan 20 '16 at 16:43
conf http://stackoverflow.com/questions/17093260/arabic-unicode-and-files-in-python — Ali SAID OMAR, Jan 20 '16 at 16:52
This did not solve the problem, I had consulted this post already. — Joseph P Nardone, Jan 20 '16 at 17:00
I do not think you understand what I am attempting to do. I want my file to open as Arabic when I open it with Excel. — Joseph P Nardone, Jan 20 '16 at 17:17

score 7 · Answer 1 · edited May 23 '17 at 10:27

Excel requires a Unicode BOM character written to the beginning of a UTF-8 file to view it properly. Without it, Excel assumes "ANSI" encoding, which is OS locale-dependent.

This writes a 3-row, 3-column CSV file with Arabic:

#!python2
#coding:utf8
import io
with io.open('arabic.csv','w',encoding='utf-8-sig') as f:
    s = u'إعلان يونيو وبالرغم تم. المتحدة'
    s = u','.join([s,s,s]) + u'\n'
    f.write(s)
    f.write(s)
    f.write(s)

Output:

For your specific example, just make sure to write a BOM character u'\xfeff' as the first characters of your file, encoded in UTF-8. In the example above, the 'utf-8-sig' codec ensures a BOM is written.

Also consult this answer, which shows how to wrap the csv module to support Unicode, or get the third party unicodecsv module.

Thanks a lot! This should be the best answer! – Yassine Akermi May 13 '19 at 22:55 — Yassine Akermi, May 13 '19 at 22:55

score 0 · Accepted Answer · answered Jan 20 '16 at 19:23

Here a snippet to write arabic in text

# coding=utf-8
import codecs
from datetime import datetime

class listener(object):


    def on_data(self, tweetsymbols):
        # python2
        # tweetsymbols is str
        # tweet = (str((data.split(',"text":"')[1].split('","source')[0]))).encode('utf-8')
        now = datetime.now()
        # work with unicode
        saveThis = unicode(now) + ':::' + tweetsymbols.decode('utf-8')
        try:
            saveFile = codecs.open('rawtwitterdata.csv', 'a', encoding="utf8")
            saveFile.write(saveThis)
            saveFile.write('\n')
        finally:
            saveFile.close()
        return self


listener().on_data("إعلان يونيو وبالرغم تم. المتحدة")

All you must know about encoding https://pythonhosted.org/kitchen/unicode-frustrations.html

Unable to Save Arabic Decoded Unicode to CSV File Using Python

2 Answers2

Linked