0

I have a problem with my python 2.7 script. Script is declared in UTF8, the Data I get from Google Search Console API are in Unicode, and the Database where I want to store them is in UTF8 too (UTF-8 Unicode utf8mb4, utf8mb4_general_ci).

If I launch the script on my mac (and store the data on my physical ubuntu server, no problem at all. If I launch the same script, directly from the server or on my mac through SSH, I get a latin-1 codec error.

I triple checked locale variable on both server and my mac, I get exactly the same values which are :

LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=fr_FR.UTF-8

Encoding of the Ubuntu terminal is in UTF8 two ..

I don't have any idea why my script want to store the data in latin-1 because it is unicode and the base is in utf8. Also, if I specify data.encode('utf-8'), the script works but data are not correctly encoded..

Any idea ?

For information, I use the library "Dataset" to make MYSQL requests, so I can't specify any charset anywhere I think.

Quentin
  • 31
  • 1
  • 4

1 Answers1

0

MYSQLdb probably doesn't know that it is supposed to encode to utf8. Therefore, it falls back to its default latin1 charset. Pass charset='utf8' as a parameter when you do your request.

import MySQLdb

connection = MySQLdb.connect(user = 'username', db = 'database', charset = 'utf8')

Update:

Another option is to use a database query.

db.query("SET CHARACTER SET utf8;")

Hope this helps you out.

Writing UTF-8 String to MySQL with Python

http://dataset.readthedocs.io/en/latest/api.html

Community
  • 1
  • 1
Philipp Braun
  • 1,583
  • 2
  • 25
  • 41
  • Thx a lot, with "Dataset" library I don't found how to put the equivalent to `charset = 'utf8'` yet because the `dataset.connect('MY_URL', engine_kwargs={'encoding':'utf8'})` doesn't work. But at least your comment makes me understand where the problem is. – Quentin Dec 07 '16 at 18:49
  • @Quentin Sorry, I just assumed that you were using MySQLdb. I dug a bit deeper for you. Hope the update will help you. – Philipp Braun Dec 07 '16 at 19:22
  • Thanks again for your time Philipp, I tried what the doc says but it doesn't work. I chose Dataset because it simplifies a lot usage of Mysql database with python and I would like to avoid learning all the borring and complicated Mysql syntax, but obviously my linux server will not let me go this way :) – Quentin Dec 07 '16 at 21:26
  • Finally I found a way to make it work with dataset lib ! `dataset.connect('mysql://username:password@server/database?charset=utf8mb4)`, for those who are interested in :) – Quentin Dec 09 '16 at 19:38