0

I'm getting this error message:

UnicodeEncodeError at /new_app/
'charmap' codec can't encode character '\x87' in position 127: character maps to <undefined>

I get data from a query in teradata using pandas and python teradata module. After creating connection, I do this:

In [10]: df = pd.read_sql(sqlStr, self.session)

In [11]: df
Out[11]:
          Ref      Client    Order           Group           Start Date    Item Date   Value
0  2020-04-30  01234567890  SECURITIZA�øO  45678901234     1995-11-13    2014-08-23  21031.96
1  2020-04-30  01234567890  SECURITIZA�øO  45678901234     1995-11-13    2014-08-23  21031.96

There is a problem in the word "SECURITIZA�øO".

Then I've tried to do the following:

In [12]: from django.core.files.base import ContentFile

In [13]: from new_app.models import QSet

In [14]: content = df.to_csv(index=False, encoding='utf-8')

In [15]: csvname = "test_name"

In [16]: csf = ContentFile(content, csvname)

In [17]: q = QSet.objects.create(owner = "somebody", csv_file = csf)

Then I got the error above.

This is my model config:

class QSet(models.Model):
    date_query = models.DateField(default=now, blank=True, null=True)
    owner = models.CharField(max_length=25)
    csv_file = models.FileField(blank=True, upload_to='new_app')

And , I don't know if it matters, here are some settings of my project:

LANGUAGE_CODE = 'en-us'

TIME_ZONE = 'UTC'

USE_I18N = True

USE_L10N = True

USE_TZ = True

I am struggling with this error and I think that I am misunderstanding some steps of encoding. I'll appreciate any ideas.

O Pardal
  • 647
  • 4
  • 21
  • Are you on Windows? This usually comes up when trying to output to the console in Windows. The filename in the full traceback should give you a clue as to the actual encoding generating the error. – Kevin Christopher Henry Aug 01 '20 at 09:50
  • Yes, I am on Windows. I didn't try to output to the console, at least not intentionally. I have bypassed the problem ignoring the error after the csv construction: `content = content.encode('ascii', 'ignore').decode('utf-8')`. I followed this answer here: https://stackoverflow.com/a/25402141/5838551, but I'm losing data. It's not a good option. – O Pardal Aug 01 '20 at 12:21
  • `SECURITIZAÃ?øO` what does this look like if you view it in the database directly? – snakecharmerb Aug 01 '20 at 12:43
  • 1
    Yeah, don't do that. Since Windows uses weird default encodings, the first thing I would try is setting the console encoding to utf-8. You can find answers on how to do that, but basically: `export PYTHONIOENCODING=UTF-8`. Also, what version of Python and Django are you using? – Kevin Christopher Henry Aug 01 '20 at 13:01
  • @KevinChristopherHenry, the full traceback says that the encoding is 'cp1252'. Here is the full traceback:`\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final) 17 class IncrementalEncoder(codecs.IncrementalEncoder): 18 def encode(self, input, final=False): -> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0] 20 21 class IncrementalDecoder(codecs.IncrementalDecoder): icodeEncodeError: 'charmap' codec can't encode character '\x87' in position 127: character maps to ` – O Pardal Aug 01 '20 at 17:26
  • @snakecharmerb, the original word is 'SECURITIZAÇÃO'. It is a portuguese word. But in the database it is already damaged and it is represented as 'SECURITIZAÃ?øO'. I have already accepted tha idea to have a damaged word, but Django was not allowing the object creation. – O Pardal Aug 01 '20 at 17:34
  • Django version is 2.2.3 and Python version is 3.7.3 – O Pardal Aug 01 '20 at 17:37
  • 1
    Yes, that's a standard [Windows encoding](https://en.wikipedia.org/wiki/Windows-1252). Since it has a limited character set (unlike utf-8), there are certain characters it can't encode, thus the error. It's still not clear whether it's triggered by writing to the file or something else. Did you try my suggestion to set the `PYTHONIOENCODING` environment variable? You can also try `set PYTHONUTF8=1` to force UTF-8 mode since you're using Python 3.7. – Kevin Christopher Henry Aug 02 '20 at 05:03
  • @KevinChristopherHenry, I have just tried your suggestions. They both worked in order to avoid the UnicodeEncodeError. Now, Django is creating the object in model properly. Unfortunately, the word is still damaged: "SECURITIZAÇøO". I don't know if it is a problem in the original data encoding. The fact is that I am not able to config database. If you have any ideas to deal with this issue, I'll be very happy. Otherwise, your suggestions have already solved my main problem. Thanks – O Pardal Aug 03 '20 at 12:48
  • Are you viewing the file in an application that can understand UTF-8? If you're just outputting to the console you may be running into its limited character set. (That could also be why the word looks wrong in the database output above, which is why the other commenter asked how you were viewing it.) In any case, it's hard to give any more advice. The key to avoiding these problems is to make sure that at every point in the process you're using an encoding that can represent all of Unicode (e.g. UTF-8). You'll need to go through each step in your processing chain to figure it out. – Kevin Christopher Henry Aug 04 '20 at 08:54

0 Answers0