How can I apply utf8_encode() or similar to all VARCHARs on a database? (PHP / MySQL)

Question

Suddenly I have to work with this database filled with lots of data and the whole system is full of encoding problems. Some tables are Latin1, others UTF8, mixed collations, and utf8_encode() PHP conversions all spread randomly throughout the files. It's a nightmare, and I'm trying to solve it.

Ok, I've created a PHP script to update the charset and collation of the whole schmere.

<?php
    include("vars.php");
    $servername = DB_HOST;
    $username = DB_USER;
    $password = DB_PASS;
    $dbname = DB_NAME;
    $charset = "utf8";
    $collation = "utf8_general_ci";
    $conn = new mysqli($servername, $username, $password, $dbname);
    if ($conn->connect_error) {
        die("Connection failed: " . $conn->connect_error);
    }
    $sql = "SELECT CONCAT('ALTER TABLE `', TABLE_NAME,'` CONVERT TO CHARACTER SET ".$charset." COLLATE ".$collation.";') 
            AS MYSQL FROM INFORMATION_SCHEMA.TABLES;";
    $result = $conn->query($sql);
    if ($result->num_rows > 0) {
        while ($row = $result->fetch_row()) {
            $conn->prepare($row[0])->execute();
            echo $row[0] . " EXECUTED </br>";
        }
        echo "FINISHED.";
    } else {
        echo "0 results";
    }
    $conn->close();
?>

The problem here is that lots of varchars are stored like this: "OBRAGEN ENGENHARIA E CONSTRUÃ‡Ã•ES LTDA", and the previous developer used utf8_encode() on multiple files because of that. In order to achieve a standard, I've removed all utf8_encode() conversions in order to rely only on the database. How can I "utf8_encodely" convert ALL varchars on the database? Is that possible?

score 1 · Accepted Answer · answered Dec 11 '15 at 00:43

Don't use utf8_encode().

Use mysqli_set_charset('utf8'). Then all text from MySQL will come at you as utf8. That is whether it is in a column that is CHARACTER SET latin1 or whatever. MySQL does the conversion.

Then make sure your client code thinks only utf8.

And any display mechanism uses utf8. For example, HTML:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

A quick check of what is in the tables... Find some accented character, and do

SELECT col, HEX(col) FROM tbl WHERE ...

It will show 1-byte (2 hex) for a CHARACTER SET latin1 column, or 2-byte (4 hex) for utf8.

Ã‡Ã• is Mojibake for ÇÕ. It probably means that ...

The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.

If you find that a table has incorrectly stored data, we can look at that.

score 0 · Answer 2 · edited May 23 '17 at 12:23

0

I think you have to manually identify the table fields which has the utf8 encoded content in latin1 collation tables. Then update those fields like @ABS method and provided example by @Marcel Grolms in following question.

MySQL - Convert latin1 characters on a UTF8 table into UTF8

edited May 23 '17 at 12:23

Community

1
1

answered Nov 26 '15 at 02:56

Pradeep Sanjaya

1,816
1
15
23

How can I apply utf8_encode() or similar to all VARCHARs on a database? (PHP / MySQL)

2 Answers2