0

When I tried to insert right double quotes (”) using python MySQLdb it produces UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201d' in position 0: ordinal not in range(256). python MySQLdb uses latin-1 codec by default and from the index.xml file in the /usr/share/mysql/charsets/, it is described as cp1252 West European. Hence I think that latin1 will cover cp1252 characters also. But latin1 won't cover cp1252 characters, If they does I will not get the Error.

The right double quotes are lies in cp1252 charset but not in ISO 8859-1( or latin1) charset.

There is no cp1252.xml file in /usr/share/mysql/charsets/. Why python MySQLdb is missing cp1252 charset?

Or whether the latin1 is same as cp1252 as they described in index.xml.

mcv
  • 45
  • 2
  • 10
  • *"python MySQLdb uses latin-1 codec"* This assumption is wrong. It will use any encoding you configure when you `connect()` to the database. You just did not configure anything. – Tomalak Dec 18 '18 at 08:28
  • Possible duplicate of [Python & MySql: Unicode and Encoding](https://stackoverflow.com/questions/8365660/python-mysql-unicode-and-encoding) – Tomalak Dec 18 '18 at 08:28
  • @Tomalak By default, it takes latin1 code. My error was `UnicodeEncodeError: 'latin-1' codec can't encode character` – mcv Dec 19 '18 at 09:28
  • @Tomalak It's not duplicate. I already said that i don't want to use 'utf8' as charset. Is it possible to set charset as cp1252? – mcv Dec 19 '18 at 09:40
  • Read my first comment again. – Tomalak Dec 19 '18 at 10:31
  • @Tomalak If i didn't give any charset what will be the charset selected for encoding? – mcv Dec 19 '18 at 11:38
  • @Tomalak Why I got error `UnicodeEncodeError: 'latin-1' codec can't encode character` ? I haven't configured it for latin1 codec, but the error tells MySQLdb uses latin1 codec – mcv Dec 19 '18 at 11:40

1 Answers1

0

You really need cp1252, not utf-8? I strongly recommend using utf-8.

What you need is:

  • Pass charset="utf8mb4" option to MySQLdb.connect().
  • Configure database to use utf-8.

You can create database with utf-8 by CREATE DATABASE <your db name> DEFAULT CHARACTER SET utf8mb4.

If you already have database, you can change default character set by ALTER DATABASE <your db name> CHARACTER SET utf8mb4. But you need to change all character set for existing tables in the database too.

methane
  • 469
  • 3
  • 5
  • 11
  • Yeah it's ok. But what about cp1252? Is available in MySQLdb? – mcv Dec 20 '18 at 10:47
  • No. MySQL and MySQLdb doesn't support it. latin1 is supported, but not tested well. – methane Dec 21 '18 at 11:02
  • All encodings other than ascii and utf-8 are legacy. You should use utf-8, unless there is special reason. I (maintainer of PyMySQL, mysqlclient-python (aka MySQLdb), and go-sql-driver/mysql) don't test encodings other than utf-8. – methane Dec 21 '18 at 11:05
  • MySQLdb doesn't support it but MySQL support cp1252 encoding. However, it is named as latin1. You can refer here https://dev.mysql.com/doc/refman/8.0/en/charset-we-sets.html – mcv Dec 27 '18 at 03:40
  • Yes, I said "latin1 is supported," already. cp1252 is Windows specific encoding. It's very similar to ISO-8859-1, but not identical. See https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html That's why I said cp1252 is not supported, but latin1 is supported. – methane Dec 28 '18 at 04:40
  • But In the mysql, latin-1 is same as cp1252 encoding. But in MySQLdb, it is the actual latin-1 encoding. Can you put that clause in your answer? I will accept it after – mcv Jan 03 '19 at 11:07
  • What do you mean? Do you mean latin-1 in MySQL is actually cp1252, and there are no actual latin-1 in MySQL? – methane Jan 04 '19 at 13:00
  • yeah right please see the para from mysql official page `latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1 or IANA (Internet Assigned Numbers Authority) latin1, except that IANA latin1 treats the code points between 0x80 and 0x9f as “undefined,” whereas cp1252, and therefore MySQL's latin1, assign characters for those positions. For example, 0x80 is the Euro sign. ` – mcv Jan 07 '19 at 04:56
  • You can refer here dev.mysql.com/doc/refman/8.0/en/charset-we-sets.html – mcv Jan 07 '19 at 04:57
  • Thanks. Now I understand what you said. Then the right answer is "MySQLdb doesn't support cp1252, because the maintainer didn't know latin-1 in MySQL is actually cp1252." – methane Jan 08 '19 at 06:05
  • Character set and encoding is really difficult. I don't know well about even Japanese encoding, even though I'm Japanese. "Use UTF-8 only" simplifies developer's QoL on the world. Now I know latin-1 in MySQL is actually cp1252, but I strongly recommend use UTF-8, to avoid such trouble. Everybody in this thread except you didn't understand your question. It's clear evidence that everyone should use only ASCII or UTF-8, unless they loves trouble. – methane Jan 08 '19 at 06:06