4

I am having problems displaying foreign characters (characters with accents like: é à ù ç and so on)

The row in the database is like this:

    Name | Datatype | Charset 
title | varchar(255) | utf8_general_ci

I store it like this:

function inputFilter($var)
{
    $var = trim(htmlentities(strip_tags($var)));

    if (get_magic_quotes_gpc())
        $var = stripslashes($var);

    $var = mysql_real_escape_string($var);

    return $var;
}
$title = inputFilter($_POST['title']);

and I print it like this:

print $getfromdb['title'];

This is how it's printed out:

Português //Should be: Português

I have tried adding: htmlspecialchars, utf8_decode/encode and htmlentities to the print, although nothing helps!

I've added this to the header:

<meta charset="utf-8">

What am I doing wrong?

oliverbj
  • 5,771
  • 27
  • 83
  • 178
  • 1
    I assume you are printing this on a webpage. Is it also encoded as UTF-8? – Patrick Manser Oct 23 '13 at 16:08
  • It surely is. I've added this to the header: – oliverbj Oct 23 '13 at 16:09
  • You likely haven't correctly set the [character set of your database connection](http://dev.mysql.com/doc/en/charset-connection.html). See [UTF-8 all the way through](http://stackoverflow.com/q/279170). – eggyal Oct 23 '13 at 16:09
  • Note also, for your future reference, that `utf8_general_ci` is a *collation* (used for performing comparisons/sorting) and not a *character set*. The character set is `utf8`. – eggyal Oct 23 '13 at 16:10
  • Have you tried: 1) setting the db connection to utf-8 - `mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $conn);` 2) have you specified in the meta `` 3) is the encoding of your file UTF-8? (*edit: this one is an overkill - nvm this one) – eithed Oct 23 '13 at 16:11
  • Furthermore, one should normally perform display-related escaping (such as the call to PHP's `htmlentities()` function) upon display and not upon storage. Suppose in the future that you wish to use some other output medium: one would have to undo the HTML-specific encoding stored within the database and then apply whatever media-specific encoding is required for the new output. Far better to store data in a neutral, unescaped form and then perform the requisite escaping after retrieving the data from the database. – eggyal Oct 23 '13 at 16:13
  • @eithed: It is preferable to use API-specific methods of setting the connection character set (if available - which is not the case with the deprecated mysql extension), or else use `SET NAMES`. – eggyal Oct 23 '13 at 16:14
  • `$var = trim(htmlentities(strip_tags($var)));` - in my eye, this whole line (except of trim), isnt really necessary. i'm not sure but unless your php version is >= 5.4, htmlentities can produce an unexpected output without the encoding argument. – sofl Oct 23 '13 at 16:14
  • @eggyal - I'm not assuming anything about the environment (besides that it's PHP/mysql). Whatever libraries the OP is using is his choice, and the code I posted should work irregardless. `SET NAMES` doesn't work as you'd expect it to (or at least, it didn't for me - that's why the setting of all the DB encodings) – eithed Oct 23 '13 at 16:19
  • @eithed: As documented under [Connection Character Sets and Collations](http://dev.mysql.com/doc/en/charset-connection.html), "*A `SET NAMES 'charset_name'` statement is equivalent to these three statements: `SET character_set_client = charset_name; SET character_set_results = charset_name; SET character_set_connection = charset_name;`*". Therefore, the only difference with your version is that it does not set `character_set_database` or `character_set_server`, but these affect only DDL statements (they set the default values for new objects) - they do not affect the results of DML statements. – eggyal Oct 23 '13 at 16:22
  • @eggyal - I've had these sort of issues quite long ago when everything was running different encodings (DB - utf8, table - latin_1, column - utf8/latin_1, adapter - ???, expected output - ISO-8859-2); given command became my silver bullet, even though `SET NAMES` **should** work. Now I use zend + have control on everything. – eithed Oct 23 '13 at 16:32

5 Answers5

2

Steps to Follow:

Use the meta tag for UTF8.

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

Set your PHP to use UTF8.

mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');

For mysql, you want to convert your table to UTF8.

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci

Also you can run:

 SET NAMES UTF8

as the first query after establishing a connection which will convert your DB connection to UTF8.

Jenson M John
  • 5,499
  • 5
  • 30
  • 46
2

Include mysqli_set_charset($link, "utf8"); right after every connection you make. This will work.

Minoru
  • 1,680
  • 3
  • 20
  • 44
1

Try this:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<meta charset="UTF-8"> is HTML 5 (and, BTW, UTF-8 is uppercase)

Since it looks like your test chain also involves a form (because of the $_POST), you must make sure that the UTF-8 charset is set for the form too.

Walter Tross
  • 12,237
  • 2
  • 40
  • 64
  • AFAIK if the server is sending a charset HTTP header, the browser will ignore this *meta* tag. In OP's case, it would be possible that his (misconfigured) server sends a wrong charset. – ComFreek Oct 23 '13 at 16:16
  • @ComFreek: correct. But I would give it a try, since the OP may be using a shared hosting server, and in my opinion a shared hosting server shouldn't make any assumption on the desired charset. – Walter Tross Oct 23 '13 at 16:37
1

Use SET NAMES utf8 before you query/insert into the database

query("SET NAMES utf8");
Farid Movsumov
  • 12,350
  • 8
  • 71
  • 97
0

You should use character encoding as ISO-8859-1 instead of UTF-8 as follows:

<meta charset="ISO-8859-1">

The characters you are trying to show are latin and UTF-8 i.e. UNICODE encoding cannot interpret latin characters.

Reference

And in case of mysql you should use latin1 charset.

Rajesh Paul
  • 6,793
  • 6
  • 40
  • 57