1

I want to retrieve some data from a dbase. All the tables in it have the utf8_general_ci collation.

By the way, this is a .cgi file, so it is executed by means of an Ajax call.

I'm doing this to make the connection:

#!/home/mike/python_venvs/test_venv369/bin/python
...

conn = mysql.connector.connect( host='', database='test_kernel',
                                user='root', password='root',
                                charset='utf8', use_unicode=True )
...
query = ("SELECT * from invoices limit 2")
cursor.execute( query )

for x in cursor:
    print( type( x  )) # is a tuple, i.e. the row
    for y in x:
        print( type( y ) ) # the problem field prints "str"
        if type( y ) == 'str':
            y = y.encode( 'utf-8')
        print( y )

On the encoding line above I get:

<class 'UnicodeEncodeError'> 'ascii' codec can't encode character '\xa3' in position 0: ordinal not in range(128)

With all the permutations I've tried I get the same thing. '\xa3', by the way, is the '£' character, non-ASCII.

I've tried many different approaches, found mainly here in SO: encode, decode, ... Nothing seems to work. I thought the str type was Python 2... but this is definitely a Python3 program, something which I actually checked with sys.version_info[ 0 ]!

mike rodent
  • 14,126
  • 11
  • 103
  • 157
  • What's the the output of `python -c "import locale;print(locale.getpreferredencoding())"`? – snakecharmerb May 16 '20 at 15:02
  • Thanks. I assume this is to be run in the virtual environment stipulated in the shebang at the start of this file, right? It gives "UTF-8". Adding shebang line in the question. – mike rodent May 16 '20 at 15:08
  • If this is .cgi, then it's being executed by your webserver? So maybe `su` to the webserver's user (or perhaps root, if root starts the webserver process) and execute the above command as that user. – snakecharmerb May 16 '20 at 15:13
  • Thanks again. Same output for SU. I assume the Apache server process's owner is the SU... but how do I check that? To start and stop I do `sudo systemctl start apache2`. – mike rodent May 16 '20 at 15:19
  • Maybe just print `locale.getpreferredencoding()` before you iterate over the query result – snakecharmerb May 16 '20 at 15:26
  • Blow me down with a feather: "pref encoding ANSI_X3.4-1968". What in flip's name is that all about??? – mike rodent May 16 '20 at 15:31
  • It's a fancy way of saying ASCII*. I guess you need to try the solutions [here](https://stackoverflow.com/questions/913869/how-to-change-the-default-encoding-to-utf-8-for-apache). *Technically it's a revision of the ASCII standard – snakecharmerb May 16 '20 at 15:52
  • I tried those suggestions, specifically uncommenting `#AddDefaultCharset UTF-8` in /etc/apache2/conf-available/charset.conf. Didn't work. Still getting that preferred encoding. – mike rodent May 16 '20 at 16:06

1 Answers1

1

Thanks to the help of snakecharmerb's comments, which then led me to this answer, I found a solution which works:

import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)

I think this constitutes a workaround, and it'd be great if anyone could explain how this setting for locale.getpreferredencoding() gets to be set at ASCII/ANSI_X3.4-1968 ... even better if they could then say how to set it to something else.

The culprit is probably Apache, though I'm far from sure.

The question referenced by snakecharmerb unfortunately did not provide a solution for me: I added (or rather uncommented) the following line in /etc/apache2/conf-enabled/charset.conf

AddDefaultCharset UTF-8

... and restarted Apache. No change.

Edit
Output from various settings for su which might be involved:

M17A ~ # locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
M17A ~ # echo $LANG
en_GB.UTF-8
M17A ~ # locale charmap
UTF-8

I believe it is su/root which is indeed running the Apache process.

Edit 2
I thought I'd look into the ownership of the processes on my machine. So I ran ps aux. Some possibly relevant processes came up which are not owned by me or by root:

USER # i.e. owner
...
mysql     1413  0.0  0.1 1419400 16760 ?       Ssl  May15   0:50 /usr/sbin/mysqld
...
www-data  5825  0.0  0.0 143296  5536 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5826  0.0  0.1 298492 21900 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5827  0.0  0.1 298096 18700 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5828  0.0  0.0 296044 15872 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5829  0.0  0.1 296040 16876 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5830  0.0  0.0 296052  7972 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
...
www-data  9636  0.0  0.0 296052  7856 ?        S    08:16   0:00 /usr/sbin/apache2 -k start
www-data  9639  0.0  0.0 295572  6324 ?        S    08:16   0:00 /usr/sbin/apache2 -k start
www-data  9640  0.0  0.0 295572  6324 ?        S    08:16   0:00 /usr/sbin/apache2 -k start
www-data  9641  0.0  0.0 295572  6324 ?        S    08:16   0:00 /usr/sbin/apache2 -k start

Maybe one of these owners is using this ASCII encoding? I wonder how I might find out...

mike rodent
  • 14,126
  • 11
  • 103
  • 157
  • The root cause is likely that the user that is running the apache process has an ASCII locale (try `su` to that user and execute the `locale` command, also `echo $LANG` and `locale charmap` ). So the solution probably to change their default locale (but I'm not strong on linux admin, so I don;t want to suggest this as a hard and fast solution) – snakecharmerb May 16 '20 at 19:13
  • I'm not strong on it either. I'll have a go, but all indications are that it is root (i.e. `su`) which runs the apache process. I'm adding the output from those commands to my answer. – mike rodent May 16 '20 at 19:18