0

Suddenly I have to work with this database filled with lots of data and the whole system is full of encoding problems. Some tables are Latin1, others UTF8, mixed collations, and utf8_encode() PHP conversions all spread randomly throughout the files. It's a nightmare, and I'm trying to solve it.

Ok, I've created a PHP script to update the charset and collation of the whole schmere.

<?php
    include("vars.php");
    $servername = DB_HOST;
    $username = DB_USER;
    $password = DB_PASS;
    $dbname = DB_NAME;
    $charset = "utf8";
    $collation = "utf8_general_ci";
    $conn = new mysqli($servername, $username, $password, $dbname);
    if ($conn->connect_error) {
        die("Connection failed: " . $conn->connect_error);
    }
    $sql = "SELECT CONCAT('ALTER TABLE `', TABLE_NAME,'` CONVERT TO CHARACTER SET ".$charset." COLLATE ".$collation.";') 
            AS MYSQL FROM INFORMATION_SCHEMA.TABLES;";
    $result = $conn->query($sql);
    if ($result->num_rows > 0) {
        while ($row = $result->fetch_row()) {
            $conn->prepare($row[0])->execute();
            echo $row[0] . " EXECUTED </br>";
        }
        echo "FINISHED.";
    } else {
        echo "0 results";
    }
    $conn->close();
?>

The problem here is that lots of varchars are stored like this: "OBRAGEN ENGENHARIA E CONSTRUÇÕES LTDA", and the previous developer used utf8_encode() on multiple files because of that. In order to achieve a standard, I've removed all utf8_encode() conversions in order to rely only on the database. How can I "utf8_encodely" convert ALL varchars on the database? Is that possible?

Ericson Willians
  • 7,606
  • 11
  • 63
  • 114

2 Answers2

1

Don't use utf8_encode().

Use mysqli_set_charset('utf8'). Then all text from MySQL will come at you as utf8. That is whether it is in a column that is CHARACTER SET latin1 or whatever. MySQL does the conversion.

Then make sure your client code thinks only utf8.

And any display mechanism uses utf8. For example, HTML:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

A quick check of what is in the tables... Find some accented character, and do

SELECT col, HEX(col) FROM tbl WHERE ...

It will show 1-byte (2 hex) for a CHARACTER SET latin1 column, or 2-byte (4 hex) for utf8.

ÇÕ is Mojibake for ÇÕ. It probably means that ...

  • The bytes you have in the client are correctly encoded in utf8 (good).
  • You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
  • The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.

If you find that a table has incorrectly stored data, we can look at that.

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

I think you have to manually identify the table fields which has the utf8 encoded content in latin1 collation tables. Then update those fields like @ABS method and provided example by @Marcel Grolms in following question.

MySQL - Convert latin1 characters on a UTF8 table into UTF8

Community
  • 1
  • 1
Pradeep Sanjaya
  • 1,816
  • 1
  • 15
  • 23