1

I am using SQL Server with SlashDB. It has been working fine. Error happened today that may be saying I have the wrong database charset defined. I get this error:

ValueError | 'utf8' codec can't decode byte 0xa0 in position 183: invalid start byte

Some queries work and others don’t. From my research, I see that SQL Server does not support UTF-8, which is what I have it set to. Do you know how I can tell what to set the database charset to in SlashDB database setup?

All I know is the database collation is set to SQL_Latin1_General_CP1_CI_AS

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
David B.
  • 13
  • 2

1 Answers1

1

Explanation: Looks like a non-breaking space character was written into a plain text field (type VARCHAR or CHAR). Your database's default encoding is Latin 1, but SlashDB is configured to decode that string into Unicode using a utf-8 codec. That won't work because utf-8 encoding for this character is actually two bytes: C2 A0, whereas for Latin 1 only one byte A0.

In [38]: u'\xa0'.encode('utf-8')
Out[38]: '\xc2\xa0'

In [39]: u'\xa0'.encode('latin-1')
Out[39]: '\xa0'

In [40]: print '\xa0'.decode('latin-1')


In [41]: print '\xa0'.decode('utf-8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-41-018a073d68b8> in <module>()
----> 1 print '\xa0'.decode('utf-8')

D:\Victor\Anaconda2\lib\encodings\utf_8.pyc in decode(input, errors)
     14
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_8_decode(input, errors, True)
     17
     18 class IncrementalEncoder(codecs.IncrementalEncoder):

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 0: invalid start byte

Fix: You should match encoding of your database in SlashDB's configuration. To do that edit your database connection under Configure -> Databases and change the character encoding to "latin-1". Then cycle the connection.

Recommendation: For fields where you can expect international characters or non-printable control codes outside regular 7-bit ASCII you may want to consider using Unicode data type such as NCHAR and NVARCHAR in your database table. Those don't need to be decoded as database driver will send them as Unicode to SlashDB.

Victor Olex
  • 1,458
  • 1
  • 13
  • 28