0

I am using pymysql to make the following insert into my test table. First I set up the connection and specify the charset and set unicode to True:

conn=pymysql.connect(host=*ip*,user='simao_simao',password=*mypass*,database=*mydatabase*,use_unicode=True, charset="utf8")

Then I successfully insert the values into my table.

cur = conn.cursor()
sql = "Insert into test (colx, coly) Values('%s', 'not_relevant')"%(tag)
>>> sql
"Insert into test (colx, coly) Values('£26,302 - £35,225 + 20% R','not_relevant')"
>>> cur.execute(sql)
1

My table is utf-8 unicode ready

My html code includes:

meta charset="UTF-8"

Still I get the � symbol instead of the £ in my webpage. Any idea what can be wrong?

Phil
  • 157,677
  • 23
  • 242
  • 245
Prosonic
  • 39
  • 1
  • 5
  • So you get an error **AND** the � replacement character? – Alastair McCormack Sep 04 '16 at 21:36
  • I don't get any error. I just get the replacement when the results are fetched into my website. – Prosonic Sep 04 '16 at 22:53
  • Sorry, I got what you meant now. The title is referring to a previous error I had, I am going to edit it now. – Prosonic Sep 04 '16 at 23:12
  • Are you really hardcoding the data in your script or are you reading the data from an external source, like a text/csv file? – Alastair McCormack Sep 05 '16 at 05:15
  • I am pulling data from a website using pandas. But it turns out that I get the pound symbol replaced even when I manually do an insert directly in my table. – Prosonic Sep 05 '16 at 13:27
  • Ok, that's a very important piece of information. It sounds like the encoding of your data is not 'utf-8'. Can you verify? – Alastair McCormack Sep 05 '16 at 13:31
  • I validated that it is not due to the python script the following way: droped the table in my database. Created a new table with the same name as the previous one (set the encoding to utf-8 unicode) inserted new records typing each one manually. Absolutely no crawling, no python, nothing involved, just hard coded SQL script. Still I get the unwanted symbol. – Prosonic Sep 05 '16 at 14:47
  • Again, your data does not seem to be UTF-8 encoded. How are you "typing each one manually"? – Alastair McCormack Sep 05 '16 at 14:48
  • I am using my phpMyAdmin panel typing the following querry by hand, and running it: INSERT INTO `simao_test`.`table_1` (`VacId`, `DateInsert`, `JobTitle`, `Location`, `Url`, `ExternalDate`, `JobId`, `JobType`, `Salary`, `Profession`, `Sub_Profession`, `Sector`, `Description`) VALUES ('123', '2016-09-03 10:22:27', 'Sales Assistant', 'London', 'http://example.com', '2016-09-01', 'KJDKA6', 'Full time', '£23,000 - £32,000', 'Sales Assistant', 'Assistant', 'No Sctor', 'No description'); The salary keeps being shown with wrong character. – Prosonic Sep 05 '16 at 15:05
  • Does it show correctly in phpMyAdmin? – Alastair McCormack Sep 05 '16 at 15:38
  • Yes. It shows the "£" symbol perfectly there. – Prosonic Sep 05 '16 at 15:39

1 Answers1

0

https://stackoverflow.com/a/38363567/1766831

Follow the debugging tip with SELECT ... HEX... to see if the £ is correctly stored as C2A3 (utf8), not A3 (latin1) or C382C2A3 (double-encoded).

Provide SHOW CREATE TABLE to verify that the column is utf8 or utf8mb4.

Verify that the code starts with # -*- coding: utf-8 -*- so that the encoding is utf8. (That covers one of the possible causes of black-diamond; see the link.)

If that is not enough, check more Python notes here: http://mysql.rjweb.org/doc.php/charcoll#python

Community
  • 1
  • 1
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Debugged it and it is correctly stored in C2A3. This is extracted from create table: `Salary` varchar(75) COLLATE utf8_unicode_ci NOT NULL I am not using python now I droped the table in my database. Created a new table with the same name as the previous one (set the encoding to utf-8 unicode) inserted new records typing each one manually. Absolutely no crawling, no python, nothing involved, just hard coded SQL script. This is probably a problem with the php code but I don't see why, I am using: – Prosonic Sep 05 '16 at 17:55
  • The Hex I got for this value: '£23,000 - £32,000' was: C2A332332C303030202D20C2A333322C303030 – Prosonic Sep 05 '16 at 17:58
  • C2A3 - good, so it sounds like the web browser is not rendering correctly. The meta should be right. "Manually typed"?? There has to be some kind of client. That client needs to be encoding in utf8, not latin1. That client needs to be connecting with utf8, such as using `SET NAMES utf8`. – Rick James Sep 05 '16 at 18:15