3

Should a social network site allow UTF-8 in usernames and passwords? I'm really new to PHP and I would like to know such things. I'm creating a little test project right now and I have set the collation to utf8_general_ci and SET NAMES utf8. Everthing is being inserted and displayed correctly with characters like "ÅÄÖ". But if the username or the password contain those characters you won't be able to log in. Why?

<?php
if(isset($_POST['loginBtn'])){
//variables 
$username = mb_strtolower(strip_tags(addslashes($_POST['username'])));
$password = strip_tags(addslashes($_POST['password']));
    if(empty($username) || empty($password)){
    $statusM = "Var god och fyll i båda fälten!";   
    }else{
        //$password = hash("sha512", $password);
        include("db.php");
            //the db.php contains the character set and collation set

        $sql = 'SELECT username, password FROM users WHERE username=:username AND password=:password';  
        $stmt = $db->prepare($sql);
        $stmt->bindParam(":username", $username);
        $stmt->bindParam(":password", $password);
        $stmt->execute();
        if(!$stmt->rowCount() > 0){
        $statusM = "Antingen fel lösenord eller användarnamn!"; 
        }else{

        $sql = 'SELECT * FROM users WHERE username=:username AND password=:password';
        $stmt = $db->prepare($sql);     
        $stmt->bindParam(":username", $username);
        $stmt->bindParam(":password", $password);
        $stmt->execute();
        $stmt->setFetchMode(PDO::FETCH_ASSOC);
        $row = $stmt->fetch();

        session_start();
        $_SESSION['school'] = $row['school'];
        $_SESSION['firstname'] = $row['firstname'];
        $_SESSION['lastname'] = $row['lastname'];
        $_SESSION['username'] = $row['username'];
        $_SESSION['password'] = $row['password'];
        $_SESSION['email'] = $row['email'];
        header("Location: member_home.php");
        exit();
        }
    }
}
?>

NOTE, this is my current code and it is not done AND the site is in swedish.

  • Of course it should, but without seeing your code we can't say why it isn't working for you – Mark Baker Mar 08 '14 at 17:40
  • Should you? Sure, it allows your users to increase their password character range and thus security. – Madara's Ghost Mar 08 '14 at 17:44
  • 2
    1. **Never** use `addslashes` 2. you don't need `strip_tags` when using prepared statements 3. `$_POST['username']` and `$_POST['password']` can still be not set when `$_POST['loginBtn']` is set 4. Only hashing a password with SHA512 is not enough; always use a [strong password hashing algorithm](http://php.net/password) – Marcel Korpel Mar 08 '14 at 17:45
  • BTW, why are you fetching the same info twice? First, you only ask for a username and a password, you immediately throw that information away and then you are asking for all information again: that's quite redundant code. – Marcel Korpel Mar 08 '14 at 17:52

1 Answers1

0

Unicode UTF-8 has multiple ways to encode some characters (mainly accented characters). For example, Å can be U+00C5, but it can also be described as the letter A followed by U+030A (which means "put a ring on top of previous char"). I have noticed that MySQL tends to store stuff using the latter encoding. But you might have the former Unicode UTF-8 encoding in the very SQL string, and then the comparison will not match (because of two different Unicode UTF-8 encoding of the same char). This is a very annoying problem when doing comparisons in MySQL. So, what you have to do is normalizing your Unicode UTF-8, to make sure it always encodes chars in the same way. Have a look at the example code in the PHP documentation for Normalizer::normalize for normalizing your Unicode UTF-8 before comparing strings: http://www.php.net/manual/en/normalizer.normalize.php

Alfred Godoy
  • 1,045
  • 9
  • 19
  • 1
    You're slightly muddling things here: UTF-8 is a particular encoding into bytes of the more abstract Unicode. UTF-8 has exactly one way of encoding every Unicode codepoint. However, **Unicode itself** may have multiple ways of encoding a particular "user-perceived character", like an accented letter, and that's where normalisation comes in. – IMSoP Mar 08 '14 at 20:12
  • IMSoP: Ok thanks! So, my answer would be more correct if I search-replace "UTF-8" into "Unicode"? ;) – Alfred Godoy Mar 08 '14 at 20:15
  • 1
    Yes, and also change the word "encoding" where you're using it to mean different sets of Unicode code points, since it has a specific and different meaning in this context. I'm also not sure that MySQL would have any bias to decomposed variants, rather than storing whatever you give it, unless perhaps when converting from non-Unicode representations. – IMSoP Mar 08 '14 at 20:34