1

I have made a website in modx revolution php framework and recently I have added Friendly Url functionality in apache.

As the site is in Greek I would like to have greek characters in the url, is it a valid approach for friendly urls or it can cause a problem in the future?

edit: Example Links from the japanese wikipedia ディートリヒ・ブクステフーデ

Spyros
  • 540
  • 1
  • 7
  • 21

2 Answers2

2

Section 2.2 of the the RFC 1738 document states that:

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

You can encode other characters in your URL, and some browsers (e.g. Chrome) will often decode them in the address bar.

However, to ensure the URL is readable on all browsers you should avoid using the greek characters altogether in your URL - use a US-ASCII equivalent if possible.

seanhodges
  • 17,426
  • 15
  • 71
  • 93
1

It's not a very good idea. Either urlencode the non-ASCII charaters (but that won't be very url-friendly) or convert them to their ASCII counterparts.

$url = $server . $path . '/' . urlencode($greektext);

-

$arr1 = array('Α', 'α', 'Β', 'β', ...);
$arr2 = array('A', 'a', 'B', 'b', ...);
$url = $server . $path . '/' . str_replace($arr1, $arr2, $subject);
Czechnology
  • 14,832
  • 10
  • 62
  • 88
  • Because as far as I know, http url should consist only of ASCII characters. "_Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme._" [[1](http://www.ietf.org/rfc/rfc1738.txt)] – Czechnology Mar 18 '11 at 12:19
  • @Czechnology accented domains are exists, modern browsers handling accented chars brilliantly. – fabrik Mar 18 '11 at 12:25
  • @fabrik, good point. I know it's been allowed not so long ago.. but I don't think it's a very good idea because the users might not have the right keyboard (eg. when abroad). But you're right, that depends on the webmaster. I choose to convert my urls to ASCII so that everyone is able to type them in. – Czechnology Mar 18 '11 at 12:28
  • not so long ago? "IDN was originally proposed in December 1996 by Martin Dürst and implemented in 1998" - http://en.wikipedia.org/wiki/Internationalized_domain_name – fabrik Mar 18 '11 at 12:30
  • What do you mean by “URL-friendly”? – Gumbo Mar 18 '11 at 12:32
  • 1
    @Dave Everitt, gee, that's a new Babylon! Also, a dangerous and insecure one, as mentioned in your link (when used on domains). – Czechnology Mar 18 '11 at 12:32
  • @fabrik, yes but it came to use only in the last two years (just read further in that wiki entry). – Czechnology Mar 18 '11 at 12:35
  • these domain names are available since 2004 in Hungary (but this is irrelevant now) – fabrik Mar 18 '11 at 12:37
  • 1
    @Gumbo, with "URL-friendly" I mean "easy to remember and type in". When a lot of `%NN` come in, I wouldn't call it url-friendly anymore. The same goes for non-ascii characters when I can't type them in (e.g. I don't have the correct keyboard layout available). – Czechnology Mar 18 '11 at 12:38
  • @Czechnology: That’s rather user-friendliness. – Gumbo Mar 18 '11 at 12:48
  • @Gumbo, I won't argue with that. So is there an official od a widely acknowledged definition then? – Czechnology Mar 18 '11 at 12:58
  • @Czechnology: *User-friendly* means *friendly to users* while *URL-friendly* means *friendly to URLs*. And URLs do not care whether they easy to write or to remember are. But people do care. – Gumbo Mar 18 '11 at 13:06
  • @Gumbo, I don't think I understand you correctly now. So why do we do friendly urls when they don't care? Because people like them like that. So url-friendly implies user-friendly to me. But that's just a play with words. – Czechnology Mar 18 '11 at 13:14
  • @Czechnology: No, just the term URL-friendliness makes no sense as URLs are either valid or invalid and URLs don’t care whether they are easy to read/write/remember by users. It’s only the user who cares. So you can only talk about user-friendliness. – Gumbo Mar 18 '11 at 13:31
  • @gumbo, so you just mean not to mix up "url-friendly" and "friendly url"? Acknowledged ;) – Czechnology Mar 18 '11 at 13:35