5

This question is solved, check my answer to see the solution


I'm trying to add to my DB a text with accented letters through an html form that submits with POST to a PHP page, the problem is that accented letters are converted to unreadable characters.

I have this form:

<form action="page.php" method="POST">
    <input type="textarea" id="text1" name="text1" />
    <input type="submit" value="Send" />
</form>

And then, in page.php:

echo $_POST['text1'];

The problem is that if i input àèìòù in my textarea and then i submit the form, i get this output:

à èìòù

My next step would be convert accented letters to html entities with htmlentities($_POST['text1'] but i need $_POST to give me the correct letters.

Note: in the page head i already have

<meta http-equiv = "Content-Type" content = "text/html; charset=utf-8" />

How can i fix this?

EDIT

I tried with

<form action="page.php" method="POST" accept-charset="UTF-8">
    <input type="textarea" id="text1" name="text1" />
    <input type="submit" value="Send" />
</form>

And it didn't solve it

EDIT 2

Also tried with adding

<meta charset='utf-8'>

to my document's head, and it doesn't work

EDIT 3

I tried with setting my charset as UTF-8 on the second page too, and

echo $_POST['text1'];

displayed the correct result.

I saw that the problem is whe i use htmlentities, with this code

echo htmlentities($_POST['text1']);

I get

&Atilde;&nbsp;&Atilde;&uml;&Atilde;&not;&Atilde;&sup2;&Atilde;&sup1; 

Which actually outputs

à èìòù 

even if i set charset in my meta-tags and header too. Does anyone know how can i solve it?

BackSlash
  • 21,927
  • 22
  • 96
  • 136
  • You should check your encoding on page, set URF-8 and ti will work. Google it for more details. – Senad Meškin Apr 19 '13 at 22:31
  • 6
    Probably should read: [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets](http://www.joelonsoftware.com/articles/Unicode.html). –  Apr 19 '13 at 22:31
  • did you set the meta tag to utf-8? – nathan hayfield Apr 19 '13 at 22:31
  • 1
    http://stackoverflow.com/questions/4696499/meta-charset-utf-8-vs-meta-http-equiv-content-type – craig1231 Apr 19 '13 at 22:32
  • @TimCooper This is the first occurrence, that I've seen, of joel's blog being referred on SO – Ejaz Apr 19 '13 at 22:36
  • Could you link to all relevant code? – marinus Apr 19 '13 at 22:42
  • What do you mean by "output"? Usually even if you ignore all encoding stuff, storing a textarea in a database and then sending it back to the browser should work. – Sven Apr 19 '13 at 22:46
  • The problem is the DB charset, not the page/form/textarea charset. – Eugen Rieck Apr 19 '13 at 22:46
  • @EugenRieck You didn't actually read the question, as i said i'm printing a **$_POST** value when i get this output – BackSlash Apr 19 '13 at 22:47
  • @Harlandraka I read it before you edited it - which changes things considerably – Eugen Rieck Apr 19 '13 at 22:49
  • @marinus yes: [demo](http://alessandroderoma.it/asd/index.htm) – BackSlash Apr 19 '13 at 22:49
  • 2
    @Harlakandra. Ah, I see the problem. You don't have an encoding on your output page. – marinus Apr 19 '13 at 22:50
  • @EugenRieck see my edits, you will see that this was already written in the first post – BackSlash Apr 19 '13 at 22:50
  • Your demo works for me, but you should really add this content type header with encoding. Otherwise on a different server, things will fail. – Sven Apr 19 '13 at 22:54
  • @Sven It works now because i tried marinus' solution and it worked, i'm gonna add this to my second file too – BackSlash Apr 19 '13 at 22:56
  • 1
    You always have to specify the encoding. Best way is to both send a HTTP header and put it into the HTML. The header, if present, is used when using HTTP. The HTML is used when loading the HTML without HTTP, e.g. from disk as a file. And forget about the `accept-charset`, it will not help you, because the browser sends forms in the encoding of the page, which already was UTF-8 in your case. – Sven Apr 19 '13 at 22:58
  • @marinus I saw that the problem is whe i use htmlentities, with this code: `echo htmlentities($_POST['text1']);` i get `àèìòù` Which actually outputs `à èìòù`, even if i set charset in my meta-tags and header too. Do you know how can i solve it? – BackSlash Apr 20 '13 at 09:50
  • Which headers and meta tags have you set, **exactly**? Remember, they need to be on **all** the scripts, not just one or the other, to force UTF-8 on all steps of the way. – Sébastien Renauld Apr 20 '13 at 12:22
  • @SébastienRenauld i set `header('Content-Type: text/html; charset=utf-8');` before printing any output on my php page (the previous page is simple html), and, in addition, first html page and php page have `` in the head section – BackSlash Apr 20 '13 at 12:26
  • Quick question - are you on a windows server? If so, there's an additional setting in `php.ini` that might need looking after. – Sébastien Renauld Apr 20 '13 at 12:30
  • @SébastienRenauld No i'm on a linux server – BackSlash Apr 20 '13 at 12:31
  • Also, header on the html page as well, please. Right now, my form is being submitted as ISO-8859-1. – Sébastien Renauld Apr 20 '13 at 12:31
  • (Also, remove the `` - that's not valid XHTML, nor valid HTML full stop.) – Sébastien Renauld Apr 20 '13 at 12:33
  • @SébastienRenauld OK added header. new link: http://alessandroderoma.it/asd – BackSlash Apr 20 '13 at 12:35
  • @Harlandraka: your source-code is saved as utf-8, right? Bit of a silly question, but the conversion is not done automatically. If you're using Notepad++ for the source, it's in the Encoding menu - Encode in UTF-8 without BOM. – Sébastien Renauld Apr 20 '13 at 12:39
  • @SébastienRenauld yes it is encoded as utf-8 – BackSlash Apr 20 '13 at 12:40
  • Can you throw an accent/special character on the HTML page as confirmation? Can't tell otherwise as UTF-8 and ISO-8859 overlap for roman characters. – Sébastien Renauld Apr 20 '13 at 12:41
  • @SébastienRenauld if i write an accented letter from html it is printed well, but with post request it is not, check the page – BackSlash Apr 20 '13 at 12:47
  • I meant on the page of the form. It is there that the problem happens - or in your PHP configuration. The form is being transferred as ISO-8859-1. – Sébastien Renauld Apr 20 '13 at 12:50
  • @SébastienRenauld Ok, added accented `a` also on the main page, it displays correctly – BackSlash Apr 20 '13 at 13:02

2 Answers2

3

Ok, i finally solved it. Even if i was setting charset, no matter if setting it with PHP header

header('Content-Type: text/html; charset=utf-8');

or with HTML meta tag

<meta http-equiv = "Content-Type" content = "text/html; charset=utf-8" />

and saving my file as UTF-8, it didn't work.

Entering àèìòù and then processing it with htmlentities was always resulting in

&Atilde;&nbsp;&Atilde;&uml;&Atilde;&not;&Atilde;&sup2;&Atilde;&sup1; 

That, in "readable" characters, is:

à èìòù

I just changed this:

echo htmlentities($_POST['text1']);

to this:

echo htmlentities($_POST['text1'], ENT_QUOTES, "UTF-8");

and everything worked, i get my real input printed out.

Thank you all for your help!

BackSlash
  • 21,927
  • 22
  • 96
  • 136
2

I'm going to go the other way - add to your PHP code <?php header("Content-Type: text/html; charset=utf-8"); ?>. Saves having meta tags (that some browsers casually ignore...)

From the encoding representation that you provided, PHP is echoing an UTF-8 encoded string, while your browser is assuming that the output will be ISO-8859-1. Setting that header will make all browsers understand that UTF-8 is expected, provided that they had it under Accept-encoding. if they didn't, they'll barf, but I only know of one "modern" browser that doesn't, and it is about 0.2% of the market.

Note that you will need to throw that line first, before any other output (or you can output-buffer the lot, which makes life easier but drains a bit more memory)

Sébastien Renauld
  • 19,203
  • 2
  • 46
  • 66
  • Which browser do you think of? – Sven Apr 19 '13 at 22:47
  • Epiphany. Only applies to linux and even then, I'm not sure if it has been fixed on that point. – Sébastien Renauld Apr 19 '13 at 22:57
  • You do know that the HTML meta tag only applies if the server does not send a HTTP header? Some servers don't by default, others do specify a default content type and encoding, which disables any `` in HTML. – Sven Apr 19 '13 at 23:00
  • @Sven: Why do you think I suggested throwing a header to not have to worry about meta tags and some browsers casually ignoring them? :-) – Sébastien Renauld Apr 19 '13 at 23:03