8

I have a URL like this pattern:

www.example.com/ClassName/MethodName/Arg1/Arg2

Also here is my .htaccess file:

RewriteEngine on

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

RewriteRule ^(.*)$ index.php?rt=$1 [L,QSA]

ErrorDocument 404 /error404.html

And this is my routing system:

if (empty($_GET['rt'])) {
    require_once('application/home.php');
} else {
    require_once('application/search.php');
    $url  =  rtrim ($_GET['rt'], '/');
    $url  =  explode('/', $url);

    $ClassName  =   array_shift($url);
    $MethodName =   array_shift($url);
    $Arg1       =   array_shift($url);
    $Arg2       =   array_shift($url);
}

Now what is the problem? Well, Everything is fine ..! Routing is completely fine for every URLs except when I use م in the URL. (م is a Persian character)

For e.g.

www.example.com/ClassName/Methodname/124/روز خوب      // it is fine
www.example.com/ClassName/Methodname/254/سلام بر       // it isn't fine
//                       because there is م ^ in the URL

So when I use م in the URL, I will faced with 404 Not Found page:

enter image description here


Well, I don't know that problem comes from where .. do you know? And how can I fix it? Is it a encoding issue? Or what?

Note: I use Xampp v3.2.1 (apache).


EDIT: As mentioned in the comment, I add these two examples:

<?php

$str = "www.example.com/ClassName/Methodname/124/روز خوب";
$url = explode('/', $str);
echo "<pre>";
print_r($url);

/*
Array
(
    [0] => www.example.com
    [1] => ClassName
    [2] => Methodname
    [3] => 124
    [4] => روز خوب
)
*/

Two: (this directs to 404 Not Found)

<?php

$str = "www.example.com/ClassName/Methodname/254/سلام بر";
$url = explode('/', $str);
echo "<pre>";
print_r($url);

/*    
Array
(
    [0] => www.example.com
    [1] => ClassName
    [2] => Methodname
    [3] => 254
    [4] => سلام بر
)
*/

EDIT2: According to a few tests, I figured out the script that should get called by my rewriting rules (index.php), it even doesn't get call.


EDIT3: I enabled rewrite logging on Apache and when I check the result, there is a interesting thing:

(These two samples aren't related to the above examples)

Working routing sample:

[Sat Jan 02 22:17:00.276918 2016] [rewrite:trace3] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] [perdir C:/xampp/htdocs/myweb/] add path info postfix: C:/xampp/htdocs/myweb/islamic_sources -> C:/xampp/htdocs/myweb/islamic_sources/sahifeh_sajadiyeh/1580/\xd9\x86\xd8\xa8\xd9\x88\xd8\xaa, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.276918 2016] [rewrite:trace3] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] [perdir C:/xampp/htdocs/myweb/] strip per-dir prefix: C:/xampp/htdocs/myweb/islamic_sources/sahifeh_sajadiyeh/1580/\xd9\x86\xd8\xa8\xd9\x88\xd8\xaa -> islamic_sources/sahifeh_sajadiyeh/1580/\xd9\x86\xd8\xa8\xd9\x88\xd8\xaa, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.276918 2016] [rewrite:trace3] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] [perdir C:/xampp/htdocs/myweb/] applying pattern '^(.*)$' to uri 'islamic_sources/sahifeh_sajadiyeh/1580/\xd9\x86\xd8\xa8\xd9\x88\xd8\xaa', referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.276918 2016] [rewrite:trace4] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] [perdir C:/xampp/htdocs/myweb/] RewriteCond: input='C:/xampp/htdocs/myweb/islamic_sources' pattern='!-f' => matched, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.276918 2016] [rewrite:trace4] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] [perdir C:/xampp/htdocs/myweb/] RewriteCond: input='C:/xampp/htdocs/myweb/islamic_sources' pattern='!-d' => matched, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.276918 2016] [rewrite:trace2] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] [perdir C:/xampp/htdocs/myweb/] rewrite 'islamic_sources/sahifeh_sajadiyeh/1580/\xd9\x86\xd8\xa8\xd9\x88\xd8\xaa' -> 'index.php?rt=islamic_sources/sahifeh_sajadiyeh/1580/\xd9\x86\xd8\xa8\xd9\x88\xd8\xaa', referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.276918 2016] [rewrite:trace3] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] split uri=index.php?rt=islamic_sources/sahifeh_sajadiyeh/1580/\xd9\x86\xd8\xa8\xd9\x88\xd8\xaa -> uri=index.php, args=rt=islamic_sources/sahifeh_sajadiyeh/1580/\xd9\x86\xd8\xa8\xd9\x88\xd8\xaa, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.277919 2016] [rewrite:trace3] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] [perdir C:/xampp/htdocs/myweb/] add per-dir prefix: index.php -> C:/xampp/htdocs/myweb/index.php, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.277919 2016] [rewrite:trace2] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] [perdir C:/xampp/htdocs/myweb/] strip document_root prefix: C:/xampp/htdocs/myweb/index.php -> /myweb/index.php, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.277919 2016] [rewrite:trace1] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#28661a0/initial] [perdir C:/xampp/htdocs/myweb/] internal redirect with /myweb/index.php [INTERNAL REDIRECT], referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.277919 2016] [rewrite:trace3] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#286c8b0/initial/redir#1] [perdir C:/xampp/htdocs/myweb/] strip per-dir prefix: C:/xampp/htdocs/myweb/index.php -> index.php, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.277919 2016] [rewrite:trace3] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#286c8b0/initial/redir#1] [perdir C:/xampp/htdocs/myweb/] applying pattern '^(.*)$' to uri 'index.php', referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.277919 2016] [rewrite:trace4] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#286c8b0/initial/redir#1] [perdir C:/xampp/htdocs/myweb/] RewriteCond: input='C:/xampp/htdocs/myweb/index.php' pattern='!-f' => not-matched, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.277919 2016] [rewrite:trace1] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#286c8b0/initial/redir#1] [perdir C:/xampp/htdocs/myweb/] pass through C:/xampp/htdocs/myweb/index.php, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.470250 2016] [rewrite:trace3] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#286c1b8/initial] [perdir C:/xampp/htdocs/myweb/] strip per-dir prefix: C:/xampp/htdocs/myweb/fonts/taha/QuranTaha.woff -> fonts/taha/QuranTaha.woff, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.470250 2016] [rewrite:trace3] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#286c1b8/initial] [perdir C:/xampp/htdocs/myweb/] applying pattern '^(.*)$' to uri 'fonts/taha/QuranTaha.woff', referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.470250 2016] [rewrite:trace4] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#286c1b8/initial] [perdir C:/xampp/htdocs/myweb/] RewriteCond: input='C:/xampp/htdocs/myweb/fonts/taha/QuranTaha.woff' pattern='!-f' => not-matched, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85
[Sat Jan 02 22:17:00.470250 2016] [rewrite:trace1] [pid 3188:tid 1728] mod_rewrite.c(475): [client ::1:49413] ::1 - - [localhost/sid#c397b0][rid#286c1b8/initial] [perdir C:/xampp/htdocs/myweb/] pass through C:/xampp/htdocs/myweb/fonts/taha/QuranTaha.woff, referer: http://localhost/myweb/search?s=islamic_sources&q=%D8%B3%D9%84%D8%A7%D9%85

Not working (redirects to 404 not found) sample:

[Sat Jan 02 22:07:09.734092 2016] [rewrite:trace3] [pid 3188:tid 1712] mod_rewrite.c(475): [client ::1:64955] ::1 - - [localhost/sid#c397b0][rid#83ec138/initial] [perdir C:/xampp/htdocs/myweb/] add path info postfix: C:/xampp/htdocs/myweb/islamic_sources -> C:/xampp/htdocs/myweb/islamic_sources/sahifeh_sajadiyeh/306/\xd9\x85\xd8\xa8
[Sat Jan 02 22:07:09.734092 2016] [rewrite:trace3] [pid 3188:tid 1712] mod_rewrite.c(475): [client ::1:64955] ::1 - - [localhost/sid#c397b0][rid#83ec138/initial] [perdir C:/xampp/htdocs/myweb/] strip per-dir prefix: C:/xampp/htdocs/myweb/islamic_sources/sahifeh_sajadiyeh/306/\xd9\x85\xd8\xa8 -> islamic_sources/sahifeh_sajadiyeh/306/\xd9\x85\xd8\xa8
[Sat Jan 02 22:07:09.734092 2016] [rewrite:trace3] [pid 3188:tid 1712] mod_rewrite.c(475): [client ::1:64955] ::1 - - [localhost/sid#c397b0][rid#83ec138/initial] [perdir C:/xampp/htdocs/myweb/] applying pattern '^(.*)$' to uri 'islamic_sources/sahifeh_sajadiyeh/306/\xd9\x85\xd8\xa8'
[Sat Jan 02 22:07:09.734092 2016] [rewrite:trace1] [pid 3188:tid 1712] mod_rewrite.c(475): [client ::1:64955] ::1 - - [localhost/sid#c397b0][rid#83ec138/initial] [perdir C:/xampp/htdocs/myweb/] pass through C:/xampp/htdocs/myweb/islamic_sources

Interesting point: when routing is fine, that Persian string will be like this: (just decode):

%D8%B3%D9%84%D8%A7%D9%85

But when routing is 404 not found, the Persian string will be like this:

\xd9\x86\xd8\xa8\xd9\x88\xd8\xaa

Seems there is two different kinds of encoding ..

Shafizadeh
  • 9,960
  • 12
  • 52
  • 89
  • can you update your question to include a `print_r($url);` **after** the `explode`. Include two versions, one with the offending character and one with a url that works fine. – Alex Andrei Jan 02 '16 at 09:35
  • معمولا کارکتر های یو تی اف 8 توی هیچ نسخه ای خوب کار نمیکنن . یه راه میانبر بهت پیشنهاد میدم ، توی گوگل کلمه فارسی رو جستجو کن معادل اون رو با عبارات انگلیسی و کد شده نشون میده . از همون برای آدرس هات استفاده کن – Arash Hatami Jan 02 '16 at 09:37
  • @AlexAndrei I did it ...! – Shafizadeh Jan 02 '16 at 09:41
  • @ArashHatami Actually I don't what that ..! Because I like to have a clean URL. Otherwise I can encode the URL .. But as I said, I don't want. I want to see the exact Persian words in the URL. – Shafizadeh Jan 02 '16 at 09:42
  • Is the 404 coming from your application logic or the rewrite? I mean after you break down the url into components, you check your database and return a response, maybe that's where the underlying issue is. Can you show how the matching or check is done? – Alex Andrei Jan 02 '16 at 09:48
  • http://stackoverflow.com/questions/2742852/unicode-characters-in-urls – Arash Hatami Jan 02 '16 at 09:49
  • 404 coming from that `.htaccess` file. Because even I change ClassName or MethodName, still there is 404 Not Found. – Shafizadeh Jan 02 '16 at 09:50
  • 2
    The output you added looks to me as if the arguments were parsed all right. So the question is why you get a 404 although your script apparently _is_ executed... I assume the issue is a step _after_ that parsing code you showed us, so when the parsed arguments get used. That would be the actual routing step then which you did not show to us. – arkascha Jan 02 '16 at 09:50
  • @arkascha But I think the problem is *before* that parsing ...! It comes from that `.htaccess` file methinks – Shafizadeh Jan 02 '16 at 09:51
  • @ArashHatami Thanks, I will take a look at it. – Shafizadeh Jan 02 '16 at 09:52
  • 1
    Why that? You script clearly creates an output (so it is executed) and the output looks valid to me... – arkascha Jan 02 '16 at 09:52
  • I think you have to take a look at your actual routing step, maybe you are using an autoloader using those parsed arguments. If that autoloader cannot find a class, then you would also get a 404... What does your error log file say? – arkascha Jan 02 '16 at 09:53
  • @arkascha Those two examples are out of my website ...! I wore the URL as a string and the parsed it. That script is a separated script from my website folder. – Shafizadeh Jan 02 '16 at 09:53
  • also try to add `AddDefaultCharset utf-8` to `httpd.conf` file – Arash Hatami Jan 02 '16 at 09:55
  • Please try to find out first what step actually throws the 404 error. Make a dump of the request data right at the beginning of your routing script and stop execution afterwards. _Does your router script get called or not?_ – arkascha Jan 02 '16 at 09:56
  • 1
    I just made a short test with your rewrite rules on my system, I cannot see any problems with that. My script gets called, receives the argument as expected and is able to parse it to the original requested route specified. – arkascha Jan 02 '16 at 09:58
  • @arkascha Look, When I use `م` in the URL, then it redirects to 404. Even I write a wrong classname, wrong methodname, or anything else, sitll 404. It shows the problem is in the first layer or `.htaccess` – Shafizadeh Jan 02 '16 at 09:58
  • @arkascha Did you use Xampp/apache? – Shafizadeh Jan 02 '16 at 09:59
  • @ArashHatami I tested it already (several times), But the problem is still there. – Shafizadeh Jan 02 '16 at 10:01
  • Sorry, I cannot follow your reasoning in your last comment. _Why_ does this say the issue is within the `.htaccess` style file? In contrary: you yourself say that you _also_ get a 404 if you specific a non existing class in your request! So I would say your router _does_ receive the route, but is unable to find the class. – arkascha Jan 02 '16 at 10:01
  • @arkascha emm, I don't know, maybe you are right. – Shafizadeh Jan 02 '16 at 10:03
  • Once more: find out if your router script gets called! I asked you to do that above. – arkascha Jan 02 '16 at 10:03
  • @arkascha Look, Yes you asked once, But what do you mean *"router"* exactly? Router of my website? Router of apache? or what is router? – Shafizadeh Jan 02 '16 at 10:07
  • Your `index.php` script, so the script that should get called by your rewriting rules. _Does_ it get called? – arkascha Jan 02 '16 at 10:08
  • 1
    @arkascha I wrote `exit()` in the first line of `index.php`, but still there is 404 not found. I think it doesn't get call. – Shafizadeh Jan 02 '16 at 10:10
  • 1
    OK, then I suggest you enable rewrite logging in your http server to be able to see what is going on. You have to find out what your request gets written to and why. Here is the documentation: http://httpd.apache.org/docs/current/mod/mod_rewrite.html#logging – arkascha Jan 02 '16 at 10:20
  • @arkascha I enabled it using [this pattern](http://www.leonardaustin.com/blog/technical/enable-mod_rewrite-in-xampp/), But nothing wasn't changed – Shafizadeh Jan 02 '16 at 14:07
  • `$ClassName = array_shift($url);` Nice security hole you got there. – Quolonel Questions Jan 02 '16 at 14:36
  • Rewrite logging obviously does not change the behavior. It is _logging_. It helps to understand what is going on inside your rewrite engine. If you enabled it you will get additional log entries in your http servers error log file about the detailed steps in the rewrite process. Very interesting and helpful. Do not forget to deactivate it again once you solved your issue. – arkascha Jan 02 '16 at 14:41
  • Oh, and that "pattern" you linked has nothing to do with rewrite logging. It explains how to enable the rewrite module, but you already had that enabled. Take a look into the documentation instead: http://httpd.apache.org/docs/current/mod/mod_rewrite.html#logging – arkascha Jan 02 '16 at 14:43
  • I cannot reproduce this issue on my local Apache 2.4.39. I copy/pasted 404 causing URL from question and it working as desired by opening `index.php?rt=سلام بر` – anubhava Oct 05 '19 at 17:41

2 Answers2

11

You could try this variant that may work better:

RewriteRule ^([\s\S]*)$ index.php?rt=$1 [L,B,QSA]

The changes that this makes are:

1: using [\s\S] to match absolutely any character, instead of . which matches anything but a newline.

Though you wouldn't normally expect newline (%0A) to be in your URLs, my suspicion is that Apache's regexp matcher is treating your input path as being in the ISO-8859-1 encoding.

The IRI character U+0645 Arabic Letter Meem م UTF-8-URL-encodes to URI sequence %D9%85, and whilst byte 0xD9 is okay in ISO-8859-1, 0x85 decodes to U+0085 Next Line (NEL), an undesirable legacy control character that often counts as a newline. So if that happened, the expression .* wouldn't match it.

Having said all that, this is quite theoretical as your example works as-is for me, on an old XAMPP 1.8.2 I had lying about on WinXP.

2: using the [B] rewrite flag, to ensure all bytes are passed in correctly-URL-encoded form in the parameter.

Otherwise, non-ASCII characters would break for situations where Apache sends the query string to PHP through Windows environment variables. The Windows environment is Unicode, so Apache has to decode the bytes on writing and PHP has to encode them again on reading, and unfortunately those encodings don't match.

Apache uses ISO-8859-1 and PHP (via C stdlib) uses the ANSI code page, which depends on the locale of the Windows installation. On a Western install, you get code page 1252, which is close to ISO-8859-1 so only some of the bytes will be wrong (again, this includes the 0x85 in م); on other locales with other ANSI code pages all the non-ASCII characters will be wildly wrong.

This doesn't necessarily apply to you as XAMPP is using mod_php, which doesn't need to use the environment to pass strings. But it would make a difference in other hosting environments. In any case, without [B] you'll find URL-special characters in the string (ampersand, plus, percent) break the query parser.

bobince
  • 528,062
  • 107
  • 651
  • 834
  • You know, you are brilliant ..! Every night, before sleep, I open this question and take a look at your answer and enjoy `:-)` ..! Very tricky it is `:-)` .. Thank you Bro ..! – Shafizadeh Dec 04 '16 at 17:03
  • This saved my project! you just gained me 3000 USD ! Thank you my friend. – Xees Oct 06 '18 at 13:59
0

Use URL encoder:

String encodedURL = "www.example.com/ClassName/Methodname/254/" + URLEncoder.encode("Any text in any language", "utf-8");

UTF-8 of course is the encoding to use.

You can read more here: https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html