1

Is utf-8/unicode used by most of the clients ? And if not, should i care, the user could change the charset in the browser settings / or update the software ?

I am working with MySQL and PHP (and others), my db and my tables use unicode, because it will contain usernames and texts in different languages.

In php I work with : -multibyte string functions -regexp (u modifier and unicode letters \p{L}) -an is_string_utf8 function, so everything else will be rejected

The u-modifer wants valid utf-8, so the input has to be utf-8 (or not ?)

I also use prepared statements, that and my is_string_utf8 function are supposed to prevent multi-byte attacks.

Does it work ? -Yes.

But if the user has another charset than unicode in his browser, it won't work properly, is_string_utf8 will reject most of the sended information.

So, my questions : Should i care about ISO and the other charsets ? Isn't utf-8 the standard by now ? Could i use mb_convert_encoding or is it more complicated than that, converting all charsets into utf-8 ? Is converting the charset still secure ?

Thank you very much in advance.

PatrickG
  • 11
  • 2

3 Answers3

0

The browser will use the character set encoding that your website specifies in the pages it serves to the clients. I don't know how the user can set a character set in the browser. According to http://w3techs.com/technologies/details/en-utf8/all/all around 85% of websites serve pages encoded with UTF-8. Since UTF-8 can encode any unicode character you'll be just fine by having all data between your site and your users encoded in UTF-8.

JJF
  • 2,681
  • 2
  • 18
  • 31
  • Thank you for the answer :) – PatrickG Nov 07 '15 at 13:44
  • But can't the user change the html charset with firebug for example ? It will send non-utf-8 values to my php-script. – PatrickG Nov 07 '15 at 13:45
  • I'm not a Firebug user but for the user to change the charset of the data being sent to your website they would need to change the code used to send that data. Code you wrote. Why do you feel like you need to be able to handle that? If the data comes to you encoded in a format other than what you intended give an error. – JJF Nov 07 '15 at 13:50
  • And there are no Browsers without utf-8 support or without utf-8 as standard ? I just want to make sure, that everyone is happy and my server is secure :D – PatrickG Nov 07 '15 at 13:57
  • @JJF if you don't know it's probably best you don't answer. A user may maliciously change the encoding by just editing the HTML using Chrome Dev tools. – Alastair McCormack Nov 11 '15 at 06:32
  • Of course I understand someone can use dev tools etc to modify anything on a page. The OP said in the question 'charset in the browser settings'. How is that possible? What browser lets you set a char set in it's settings? – JJF Nov 11 '15 at 11:40
0

You need to set the encoding of the data you're receiving from your client and not let it to chance.

HTML forms should set accept-charset attribute to set the character set encoding:

<form method="post" action="/your/url/" accept-charset="UTF-8">

See UTF-8 all the way through for further information about ensuring UTF-8 is saved and served correctly

Community
  • 1
  • 1
Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
0

Are you running some kind of service? Then simply mandate that everyone use utf8 (utf8mb4 in MySQL and UTF-8 outside MySQL).

Note that I said utf8mb4 -- this is because of Chinese, Emoji, and several other things not covered by MySQL's utf8.

When generating html output, be sure to have the meta tag specifying UTF8.

I have seen a lot of question come through this and other forums; most use utf8; only a few deal with other character sets. I suggest that the other character sets could (and should) be relegated to the dust bin as antiquated and no longer of much use. (Remember EBCDIC?)

It is good that you are validating the client's text. However, non-utf8 bytes will be truncated when storing into a column with CHARACTER SET utf8 (or utf8mb4).

Rick James
  • 135,179
  • 13
  • 127
  • 222