0

I wrote a PHP script that creates 5 databases and inserts some data (using $mysqli->multi_query). When I tried it out, unicode (ä ö ü ß •) was broken, so I called utf8_decode on the whole sql query. However, this worked only for ä ö ü, not for ß • and not for uppercase Ä Ö Ü. This really confuses me.

Output of ß: �?Ÿ

I'm not a specialist in different character encodings, but until now, I always managed to get the proper output.

All my files are encoded in UTF-8 and the meta tag

<meta charset="utf-8">

is in every website.

Can someone help me on this?

Here are parts of the SQL query (it's really, really long)

-- phpMyAdmin SQL Dump
-- version 4.6.5.2
-- https://www.phpmyadmin.net/
--
-- Host: 127.0.0.1
-- Erstellungszeit: 22. Mrz 2017 um 21:29
-- Server-Version: 10.1.21-MariaDB
-- PHP-Version: 5.6.30

SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";


/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8mb4 */;


CREATE TABLE `forum` (
  `id` int(10) UNSIGNED NOT NULL,
  `autor` varchar(50) NOT NULL,
  `text` varchar(2000) NOT NULL,
  `datum` int(10) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT;

. . .

/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
Aloso
  • 5,123
  • 4
  • 24
  • 41

1 Answers1

0
  1. Do not ever use the PHP utf8_decode() function. The only thing it will do is irrecoverably mangle UTF-8 data, by (poorly) converting it to ISO8859-1. Remove it from anywhere it appears in your application.

  2. Your table has DEFAULT CHARSET=latin1. Change that to utf8mb4 to store Unicode text:

    ALTER TABLE `forum` CONVERT TO CHARACTER SET utf8mb4;
    

    If and only if your MySQL install is too old to support utf8mb4, use utf8 instead. The utf8 character set supports a limited subset of Unicode (in particular, it does not support emoji), but will at least support a wider selection of characters than latin1.

  • So, now it is displayed correctly in phpMyAdmin, but not on the website. Every non-ascii character is a �. – Aloso Mar 25 '17 at 20:47
  • 2
    I figured it out. I have to call `$mysqli->set_charset("utf8mb4");` every time I create a connection. Finally it works :) – Aloso Mar 25 '17 at 20:59
  • @Aloso which is exactly what i had in mind when I posted my comment up there ;-) was probably all you need to do also. – Funk Forty Niner Mar 25 '17 at 20:59
  • Do not blindly use `ALTER .. CONVERT TO` -- that applies in one scenario, but not all. If you have utf8 bytes in a latin1 column, you need two `ALTERs`, neither of which uses `CONVERT TO`. – Rick James Mar 30 '17 at 03:38
  • @RickJames Right. This advice was given under the assumption that there wasn't any meaningful Unicode data in the table yet. –  Mar 30 '17 at 04:42