1

I'm using Django with below config in settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': '-',
        'USER': '-',
        'PASSWORD': '-',
        'HOST': '-',
        'PORT': '-',
        'OPTIONS': {'charset': 'utf8mb4'}
    }
}

The db server is running on AWS RDS. I have two EC2 instances, one of them is able to run the exact same code and fetch same data while from the second EC2 I'm getting this error:

     return self._execute_with_wrappers(sql  params  many=False  executor=self._execute)
File "/home/ubuntu/.virtualenvs/python39/lib/python3.9/site-packages/django/db/backends/utils.py"   line 75  in _execute_with_wrappers  
     return executor(sql     params  many    context)
File "/home/ubuntu/.virtualenvs/python39/lib/python3.9/site-packages/django/db/backends/utils.py"   line 84  in _execute    
     return self.cursor.execute(sql  params)        
File "/home/ubuntu/.virtualenvs/python39/lib/python3.9/site-packages/django/db/backends/mysql/base.py"  line 73  in execute 
     return self.cursor.execute(query    args)      
File "/home/ubuntu/.virtualenvs/python39/lib/python3.9/site-packages/MySQLdb/cursors.py"    line 206     in execute 
     res = self._query(query)           
File "/home/ubuntu/.virtualenvs/python39/lib/python3.9/site-packages/MySQLdb/cursors.py"    line 321     in _query  
     self._post_get_result()            
File "/home/ubuntu/.virtualenvs/python39/lib/python3.9/site-packages/MySQLdb/cursors.py"    line 355     in _post_get_result    
self._rows = self._fetch_row(0)         
File "/home/ubuntu/.virtualenvs/python39/lib/python3.9/site-packages/MySQLdb/cursors.py"    line 328     in _fetch_row  
     return self._result.fetch_row(size  self._fetch_type)      
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 180: invalid start byte

The django html response additionally shows this:

Unicode error hint
The string that could not be encoded/decoded was: \n<p>�</p>\

Also the code snippet responsible for throwing the error is:

    exp = MyModel.objects.all()
    **for e in exp:** <-- this line is throwing the error
        #do something

Versions on both servers:

EC2-1st has:

Ubuntu 16.04.4

Django==1.11.2
mysqlclient==1.3.10
django-mysql==2.1.0

python3 --version
Python 3.5.2

mysql --version
mysql  Ver 14.14 Distrib 5.7.22, for Linux (x86_64) using  EditLine wrapper

while EC2-2nd is a replica of EC2-1st with updates applied:

Ubuntu 20.04.3

Django==3.2.6
mysqlclient==2.0.3
django-mysql==3.10.0

python3 --version
Python 3.9.5

mysql --version
mysql  Ver 14.14 Distrib 5.7.35, for Linux (x86_64) using  EditLine wrapper

Also, my local server is able to run fine with these versions of the tools: [I had imported the RDS db locally with a local config that's close to the prod config to try debugging the issue].

Mac OS 11.5.2
Django==3.2.6
mysqlclient==2.0.3
django-mysql==3.10.0

Python 3.9.6

mysql  Ver 8.0.25 for macos11.3 on x86_64 (Homebrew)

What should I try?

Atul Goyal
  • 3,511
  • 5
  • 39
  • 59
  • See "black diamond" in https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Nov 22 '21 at 01:06

1 Answers1

3

The MySQL documentation Connect-Time Error Handling describes an issue when you use the MySQL 8.0 client library to connect to a MySQL 5.7 server with the utf8mb4 charset. The MySQL 8.0 client asks for the utf8mb4_0900_ai_ci collation, but the MySQL 5.7 server does not recognize that collation, so the server silently falls back to the latin1 charset with latin1_swedish_ci collation. Subsequently the server sends latin1 result sets, but the client thinks that it is receiving utf8mb4, which eventually results in a UnicodeDecodeError. As a workaround you have to explicitly SET NAMES utf8mb4. I created an issue mysqlclient#504 to ask that the python client do that every time.

To confirm that the charset is incorrect after connecting, double check the server’s value of character_set_client (the charset that statements are interpreted in), character_set_connection (the charset that statements are converted to), and character_set_results (the charset that result sets are sent as). If they are latin1 despite the client connecting as utf8mb4, then this bug probably was triggered.

with con.cursor() as c:
  c.execute("show variables like 'character_set_%'")
  for row in c:
    print(row)
(b'character_set_client', b'latin1')
(b'character_set_connection', b'latin1')
(b'character_set_database', b'latin1')
(b'character_set_filesystem', b'binary')
(b'character_set_results', b'latin1')
(b'character_set_server', b'latin1')
(b'character_set_system', b'utf8')
(b'character_sets_dir', b'/usr/share/mysql/charsets/')

I believe that a workaround of the issue would be to do the following after connecting:

# explicitly set connection charset to the same as MySQLdb.connect()
con.query("SET NAMES utf8mb4")
con.store_result()
yonran
  • 18,156
  • 8
  • 72
  • 97