2

I am doing a login script in php and html. The username and password dont need to be encrypted so I was trying to store them directly into the database (MySQL).

But some of my users have username and password containing special characters like "é" and must be case sensitive.

I have this table:

USER username: "ÉleCTeur", password: "ÉleCTeur", username: "Paul", password: "Paul"

I tried this query that work fine with "Paul" but return nothing with "ÉleCTeur".

SELECT 
    username,
    password
FROM 
    user
WHERE
    BINARY username = 'ÉleCTeur' 
    AND BINARY password = 'ÉleCTeur'

Did I just miss something or do BINARY does not compare special characters? What methode could I use to resolve this issue?

Sebastien
  • 1,308
  • 2
  • 15
  • 39
  • "é" is _not_ a "special" character. It's just a character like any other. We are in 21st century -- and still considering only plain 7-bit ASCII as being "normal"? – Sylvain Leroux Jul 09 '13 at 18:48
  • Je considerais seulement que comme le language anglais ne comporte pas d'accents Ceux-ci pour les besoins de ma question pourrais être inclus dans les charactères spéciaux. I considered that English dont have those character and that for my question they could be included in the special character. – Sebastien Jul 09 '13 at 18:55
  • But if you have a solution I would'nt mind if they are not special character – Sebastien Jul 09 '13 at 18:57
  • What is you encoding? Is it consistent all the way (from client program to MySQL tables)? – Sylvain Leroux Jul 09 '13 at 19:02
  • http://stackoverflow.com/questions/2708958/differences-between-utf8-and-latin1 – 000 Jul 09 '13 at 19:02
  • My encoding is UTF-8 for my DB and web application – Sebastien Jul 09 '13 at 19:05

1 Answers1

2

You probably have inconsistent encoding between your db and the client program?

I take some time testing from MySQL CLI:

mysql> create TABLE user (username CHAR(20), password CHAR(20)) DEFAULT CHARSET=latin1 DEFAULT COLLATE=latin1_general_ci;
Query OK, 0 rows affected (0.25 sec)

mysql> insert into user value("ÉleCTeur", "ÉleCTeur");
Query OK, 1 row affected (0.00 sec)

mysql> select * from user;
+----------+----------+
| username | password |
+----------+----------+
| ÉleCTeur | ÉleCTeur |
+----------+----------+
1 row in set (0.00 sec)

mysql> select count(*) from user where username = "EleCTeur" and password = "EleCteur";
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from user where username = "ÉleCTeur" and password = "ÉleCteur";
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

EDIT: OK, this was a (too) quick and (far too) dirty test. As Sebastien noticed it, I use case insensitive collation here. So:

mysql> select count() from user where username = "éleCTeur" and password = "éleCteur"; +----------+ | count() | +----------+ | 1 | +----------+ 1 row in set (0.01 sec)

The real answer here would be to use an case sensitive (_cs) collation. To know which one are available:

mysql> show collation where collation like '%_cs';
+--------------------+---------+----+---------+----------+---------+
| Collation          | Charset | Id | Default | Compiled | Sortlen |
+--------------------+---------+----+---------+----------+---------+
| latin1_general_cs  | latin1  | 49 |         | Yes      |       1 |
| latin2_czech_cs    | latin2  |  2 |         | Yes      |       4 |
| cp1250_czech_cs    | cp1250  | 34 |         | Yes      |       2 |
| latin7_estonian_cs | latin7  | 20 |         | Yes      |       1 |
| latin7_general_cs  | latin7  | 42 |         | Yes      |       1 |
| cp1251_general_cs  | cp1251  | 52 |         | Yes      |       1 |
+--------------------+---------+----+---------+----------+---------+
6 rows in set (0.01 sec)

Not much of them on my Debian MySQL default install.

And surprisingly enough, no 'utf8_cs' !

But MySQL has _bin collation too. Not exactly useful for ordering but quite sufficient for searching:

mysql> show collation where collation like '%_bin';

+--------------+----------+----+---------+----------+---------+
| Collation    | Charset  | Id | Default | Compiled | Sortlen |
+--------------+----------+----+---------+----------+---------+
| big5_bin     | big5     | 84 |         | Yes      |       1 |
| dec8_bin     | dec8     | 69 |         | Yes      |       1 |
| cp850_bin    | cp850    | 80 |         | Yes      |       1 |
| hp8_bin      | hp8      | 72 |         | Yes      |       1 |
| koi8r_bin    | koi8r    | 74 |         | Yes      |       1 |
| latin1_bin   | latin1   | 47 |         | Yes      |       1 |
| latin2_bin   | latin2   | 77 |         | Yes      |       1 |
| swe7_bin     | swe7     | 82 |         | Yes      |       1 |
| ascii_bin    | ascii    | 65 |         | Yes      |       1 |
| ujis_bin     | ujis     | 91 |         | Yes      |       1 |
| sjis_bin     | sjis     | 88 |         | Yes      |       1 |
| hebrew_bin   | hebrew   | 71 |         | Yes      |       1 |
| tis620_bin   | tis620   | 89 |         | Yes      |       1 |
| euckr_bin    | euckr    | 85 |         | Yes      |       1 |
| koi8u_bin    | koi8u    | 75 |         | Yes      |       1 |
| gb2312_bin   | gb2312   | 86 |         | Yes      |       1 |
| greek_bin    | greek    | 70 |         | Yes      |       1 |
| cp1250_bin   | cp1250   | 66 |         | Yes      |       1 |
| gbk_bin      | gbk      | 87 |         | Yes      |       1 |
| latin5_bin   | latin5   | 78 |         | Yes      |       1 |
| armscii8_bin | armscii8 | 64 |         | Yes      |       1 |
| utf8_bin     | utf8     | 83 |         | Yes      |       1 |
| ucs2_bin     | ucs2     | 90 |         | Yes      |       1 |
| cp866_bin    | cp866    | 68 |         | Yes      |       1 |
| keybcs2_bin  | keybcs2  | 73 |         | Yes      |       1 |
| macce_bin    | macce    | 43 |         | Yes      |       1 |
| macroman_bin | macroman | 53 |         | Yes      |       1 |
| cp852_bin    | cp852    | 81 |         | Yes      |       1 |
| latin7_bin   | latin7   | 79 |         | Yes      |       1 |
| cp1251_bin   | cp1251   | 50 |         | Yes      |       1 |
| cp1256_bin   | cp1256   | 67 |         | Yes      |       1 |
| cp1257_bin   | cp1257   | 58 |         | Yes      |       1 |
| geostd8_bin  | geostd8  | 93 |         | Yes      |       1 |
| cp932_bin    | cp932    | 96 |         | Yes      |       1 |
| eucjpms_bin  | eucjpms  | 98 |         | Yes      |       1 |
+--------------+----------+----+---------+----------+---------+
35 rows in set (0.00 sec)

In the following example, I create a table using utf8 encoding and utf8_bin collation. Please note you could change encoding/collation on a per-column basis instead.

mysql> create TABLE user (username CHAR(20), password CHAR(20)) DEFAULT CHARSET=utf8 DEFAULT COLLATE=utf8_bin;
Query OK, 0 rows affected (0.26 sec)

mysql> insert into user value("ÉleCTeur", "ÉleCTeur");
Query OK, 1 row affected (0.00 sec)


mysql> select count(*) from user where username = "ÉleCTeur" and password="ÉleCTeur";
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from user where username = "éleCTeur" and password="éleCTeur";
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.01 sec)
Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
  • I understand your point but if you try "électeur" in your second query it will still return the `count(*)` to be = 1 – Sebastien Jul 09 '13 at 19:03
  • @JoeFrambach As a matter of fact, I tried with utf8 first, then latin1 (as I guess the OP has to deal with french) -- I copied here, somehow unfortunately, only the latter. More generally you are right, utf8 should be the encoding of choice today. – Sylvain Leroux Jul 09 '13 at 19:05
  • @Sebastien look at my 'collation': `latin1_general_ci`. `_ci` like _case insensitive_. To distinguish between _é_ and _É_, you have to use a `_cs` (_case sensitive_) collation. – Sylvain Leroux Jul 09 '13 at 19:07
  • Ok! now I see how, but I would have to create a new table that is case sensitive... is it possible without having to create new tables? – Sebastien Jul 09 '13 at 19:10
  • @Sebastien I've edited my answer to try to explain a little bit more those encoding/collation things. Concerning you latest comment, you could [`ALTER TABLE`](http://dev.mysql.com/doc/refman/5.1/en/alter-table.html) to change the table default encoding _or_ one column encoding. If you do that, I suggest you double-check your data after that to verify/convince yourself they are properly re-encoded. – Sylvain Leroux Jul 09 '13 at 19:22
  • OK... I'll give the answer to Sylvain because it is a fairly good answer and I dont want to be picky. But let says I have 20 000 users and passwords, creating a new table with utf-8_bin would'nt be as fast as inserting something into my first query. – Sebastien Jul 09 '13 at 19:27