28

On my PHP site, currently users login with an email address and a password. I would like to add a username as well, this username they g\set will be unique and they cannot change it. I am wondering how I can make this name have no spaces in it and work in a URL so I can use there username to link to there profiles and other stuff. If there is a space in there username then it should add an underscore jason_davis. I am not sure the best way to do this?

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
JasonDavis
  • 48,204
  • 100
  • 318
  • 537
  • 2
    There are plenty questions like this. Didn’t you get an answer with searching? – Gumbo Jan 20 '10 at 18:21
  • @Gumbo I searched SO, not google. Possibly not the correct term but I did search for "URL friendly username" with not much luck. I didn't know it was called a slug before this. – JasonDavis Jan 20 '10 at 18:27
  • 1
    Maybe not everyone is trying to convert usernames. But searching for “URL friendly string” is returning usable results. – Gumbo Jan 20 '10 at 19:04
  • Similar: http://stackoverflow.com/questions/5305879 – GG. Jan 29 '12 at 01:12
  • Nowadays, you can use libraries like https://github.com/cocur/slugify or https://github.com/ausi/slug-generator to achieve that. – ausi Oct 30 '17 at 22:14

2 Answers2

103
function Slug($string)
{
    // convert to entities
    $string = htmlentities( $string, ENT_QUOTES, 'UTF-8' );
    // regex to convert accented chars into their closest a-z ASCII equivelent
    $string = preg_replace( '~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', $string );
    // convert back from entities
    $string = html_entity_decode( $string, ENT_QUOTES, 'UTF-8' );
    // any straggling caracters that are not strict alphanumeric are replaced with a dash
    $string = preg_replace( '~[^0-9a-z]+~i', '-', $string );
    // trim / cleanup / all lowercase
    $string = trim( $string, '-' );
    $string = strtolower( $string );
    return $string;
}

$user = 'Alix Axel';
echo Slug($user); // alix-axel

$user = 'Álix Ãxel';
echo Slug($user); // alix-axel

$user = 'Álix----_Ãxel!?!?';
echo Slug($user); // alix-axel
squarecandy
  • 4,894
  • 3
  • 34
  • 45
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • 9
    This is dangerous! Multiple unique user names can map to the same URL. That's not what you want, is it? Consider, e.g., `AB` and `ab`, which are unique strings but map to the same slug string. You should store the slug as the identifier. – John Feminella Jan 20 '10 at 18:18
  • 12
    @John Feminella: He would obviously have to check for duplicates at some point before storing the slug. – Pekka Jan 20 '10 at 18:19
  • perfect, thank you. BTW what does this part look for acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml ? – JasonDavis Jan 20 '10 at 18:19
  • @jasondavis: För stüff lıkë thîs! – Pekka Jan 20 '10 at 18:20
  • As a slight improvement, using `iconv()` to convert to `ASCII//TRANSLIT` would probably catch a lot more chars. – Frank Farmer Jan 20 '10 at 18:22
  • @Pekka: Thanks! =) @jasondavid: It removes accents, instead of having a huge lookup table we convert to html entities and fetch the unaccented char. – Alix Axel Jan 20 '10 at 18:22
  • @Frank Farmer: `iconv('UTF-8', 'ASCII//TRANSLIT', 'Álix Ãxel')` returns `'Alix ~Axel`. Using PHP 5.3.0 and .php file encoded as UTF-8 **no** BOM. – Alix Axel Jan 20 '10 at 18:25
  • 1
    Anyone know why Á and à would cause no output or error to occur when using this function? – John Conde Jan 20 '10 at 18:42
  • I'll answer my own question: remove the "'UTF-8'" parameter from htmlentities. That did the trick. – John Conde Jan 20 '10 at 19:36
  • @John Conde: I though you're talking about the `iconv()` function! You shouldn't remove the `UTF-8` from `htmlentities`, instead you should **save all your `.php` files encoded as UTF-8 no BOM**. – Alix Axel Jan 20 '10 at 19:44
  • 1
    Alex, thanks for the info. I reutnred the "UTF-8" parameter and saved the file as UTF-8 and it worked like a charm. – John Conde Jan 20 '10 at 20:17
  • 1
    @John Conde: No problem! ;) You should always save your files UTF-8 encoded. – Alix Axel Jan 20 '10 at 21:07
  • 1
    This function does not convert Polish chars "ąćęłńóśźż" => "acelnoszz", because they have no names in html entities (only numeric representation). You still need to replace those with table. Paste below code before `return` in function: `$string = strtr(mb_strtolower($string), array('ą'=>'a','ć'=>'c','ę'=>'e','ł'=>'l','ń'=>'n','ó'=>'o','ś'=>'s','ź'=>'z','ż'=>'z'));` – s3m3n Aug 05 '12 at 21:22
  • @s3men: Indeed, not only Polish characters but others as well (Chinese, Japanese, Turkish, Arabic, ...). – Alix Axel Aug 05 '12 at 21:36
  • 1
    After few more tests iconv made the trick, as someone wrote above. `setlocale(LC_ALL, "en_US.utf8"); $string = iconv("UTF-8", "ascii//TRANSLIT", '>ĄĘŁŹÓżół<');` gives me `>AELZOzol<` – s3m3n Aug 05 '12 at 23:31
  • 3
    codepad demo of the `Slug()` function, with a second identical but spaced out `nSlug()` function (for the eyeball impaired): http://codepad.org/rJNSQmGJ – Jared Farrish Dec 13 '12 at 08:02
3

In other words... you need to create a username slug. Doctrine (ORM for PHP) has a nice function to do it. Doctrine_Inflector::urlize()

EDIT: You should also keep username slug in database, as a Unique Key column. Then every search operation should be done based on that column, not original username.

Crozin
  • 43,890
  • 13
  • 88
  • 135
  • Broken link... can now be found here: https://github.com/doctrine/inflector/blob/2.0.x/lib/Doctrine/Inflector/Inflector.php – squarecandy Dec 21 '22 at 22:36
  • Though, it's worth noting that this take the "keep a comprehensive list of all characters to replace" approach, which is much harder to maintain than the approach of the accepted answer. – squarecandy Dec 21 '22 at 22:38