Form saves special latin characters as symbols

Question

My PHP form is submitting special latin characters as symbols.

So, Québec turns into QuÃ©bec

My form is set to UTF-8 and my database table has latin1_swedish_ci collation.

PHP: $db = new PDO('mysql:host=localhost;dbname=x;charset=utf8', 'x', 'x');

A bindParam: $sql->bindParam(":x", $_POST['x'],PDO::PARAM_STR);

I am new to PDO so I am not sure what the problem is. Thank you

*I am using phpMyAdmin

Can't you do everything in utf-8? Do you have a requirement that your database needs to have a latin1 charset? Working with multiple charsets within one application is of course possible, but requires detailed planning and attention. Convert the string to latin1 before storing it in the database, and convert it to your output charset (probably utf-8) before you print it. — Carsten, Mar 28 '14 at 20:24
I changed my DB table column to UTF8 and set a php header and meta to UTF8. Same result. How would you suggest changing all to UTF8? — DDDD, Mar 28 '14 at 20:48

score 1 · Answer 1 · answered Mar 28 '14 at 20:37

To expand a little bit more on the encoding problem...

Any time you see one character in a source turn into two (or more characters), you should immediately suspect an encoding issue, especially if UTF-8 is involved. Here's why. (I apologize if you already know some of this, but I hope to help some future SO'ers as well.)

All characters are stored in your computer not as characters, but as bytes. Back in the olden days, space and transmission time were much more limited than now, so people tried to save every byte possible, even down to not using a full byte to store a character. Now, because we realize that we need to communicate with the whole world, we've decided it's more important to be able to represent every character in every language. That transition hasn't always been smooth, and that's what you're running up against.

Latin-1 (in various flavors) is an encoding that always uses a single 8-bit byte for a character. Which means it can only have 256 possible characters. Plenty if you only want to write English or Swedish, but not enough to add Russian and Chinese. (background on Latin-1)

UTF-8 encodes the first half of Latin-1 in exactly the same way, which is why you see most of the characters looking the same. But it doesn't always use a single byte for a character -- it can use up to four bytes on one character. (utf-8) As you discovered, it uses 2 bytes for é. But Latin-1 doesn't know that, and is doing its best to display those two bytes.

The trick is to always specify your encoding for byte streams (like info from a file, a URL, or a database), and to make sure that encoding is correct. (Sometimes that's a pain to find out, for sure.) Most modern languages, like Java and PHP do a good job of handling all the translation issues between different encodings, as long as you've correctly specified what you're dealing with.

score 0 · Answer 2 · answered Mar 28 '14 at 20:23

0

Change your database table and column to utf8_unicode_ci.

answered Mar 28 '14 at 20:23

Kristiyan

1,655
14
17

I changed the column to utf8_unicode_ci and the issue remains. – DDDD Mar 28 '14 at 20:34

score 0 · Answer 3 · answered Mar 28 '14 at 20:24

0

Make sure you are saving the file with UTF-8 encoding (this is often overlooked)

Set headers:

<?php header("Content-type: text/html; charset=utf-8"); ?>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

answered Mar 28 '14 at 20:24

Suvash sarker

3,140
1
18
21

Should this be in the form.php or the action php file? – DDDD Mar 28 '14 at 20:35
I added the header and meta. Changed the DB table column to UTF8 and still same. – DDDD Mar 28 '14 at 20:46

score 0 · Accepted Answer · answered Mar 28 '14 at 20:29

0

You've pretty much answered your own question: you're receiving UTF-8 from the form but trying to store it in a Latin-1 column. You can either change the encoding on the column in MySQL or use the iconv function to translate between the two encodings.

answered Mar 28 '14 at 20:29

Ken Keenan

9,818
5
32
49

I changed the column to utf8_unicode_ci and the issue remains. – DDDD Mar 28 '14 at 20:34
Is there a way to change the PDO to latin1 ? – DDDD Mar 28 '14 at 20:39
This may help you: http://stackoverflow.com/questions/4361459/php-pdo-charset-set-names – KathyA. Mar 28 '14 at 20:44
@DDDD: If you want to change the PDO connection to Latin1, change `charset=utf8` to `charset=latin1` in the connection string. – Ken Keenan Mar 28 '14 at 23:28
1

When working with PHP and MySQL, it's vital that both are in agreement about what character encoding is in use, see http://www.startupcto.com/backend-tech/going-utf-8-utf8-with-php-and-mysql – Ken Keenan Mar 28 '14 at 23:30
@KenKeenan I changed the charset in the PDO connection and the issue remains. I tried 'charset=latin1' and 'charset=latin1_swedish_ci' :( I will try setting more header and meta tomorrow. Thanks – DDDD Mar 29 '14 at 01:35

Form saves special latin characters as symbols

4 Answers4