3

On a Zend Framework 2 based website (test environment on nginx and live environment on Apache) there is a category "courses" and its pages have URIs like this:

domain.tld/courses/123-Name of course that can contain ®, €, (, ), and other special chars

The courses names come from the database and are URL-encoded for the internal links:

domain.tld/courses/123-Name%20of%20course%20that%20can%20contain%20%C2%AE%2C%20%E2%82%AC%2C%20%C3%A4%2C%20(%2C%20)%2C%20and%20other%20special%20chars

It's working fine, but when I try to access a page using a special character without encoding a 404-error occures.

An example of website, that uses spacial characters is Wikipedia. You can use

http://en.wikipedia.org/wiki/Signal_(electrical_engineering)

or

http://en.wikipedia.org/wiki/Signal_%28electrical_engineering%29

and are always get the page you want.

Does someone know, how to achieve such behavior ("à la Wikipedia")? (Maybe with HTTP redirecting with a .htaccess rule?)


UPDATE:

/etc/nginx/ax-common-vhost

server {
    listen   80;
    server_name
        foo.loc
        bar.loc
        baz.loc
    ;

    if ($host ~ ^(?<project>.+)\.(?<area>.+)\.loc$) {
        set $folder "$area/$project";
    }

    access_log /var/log/nginx/$area/$project.access.log;
    error_log /var/log/nginx/error.log;

    gzip on;
    gzip_min_length 1000;
    gzip_types text/plain text/xml application/xml;

    client_max_body_size 25m;

    root /var/www/$folder/public/;

    try_files $uri $uri/ /index.php?$args;
    index index.html index.php;

    location / {
        index index.html index.php;
    sendfile off;
    }

    location ~ (\.inc\.php|\.tpl|\.sql|\.tpl\.php|\.db)$ {
        deny all;
    }

    location ~ \.htaccess {
        deny all;
    }

    if (!-e $request_filename) {
        rewrite ^.*$ /index.php last;
    }

    location ~ \.php$ {
      fastcgi_cache        off;
      #fastcgi_pass        127.0.0.1:9001;
      fastcgi_pass         unix:/var/run/php5-fpm.sock;
      fastcgi_read_timeout 6000;
      fastcgi_index        index.php;
      include              fastcgi_params;
      fastcgi_param        SCRIPT_FILENAME $document_root$fastcgi_script_name;
      fastcgi_param        APPLICATION_ENV development;
      fastcgi_param        HTTPS $https;
  }
}
automatix
  • 14,018
  • 26
  • 105
  • 230

2 Answers2

0

You can achieve the intended URL rewrite behavior by having the correct rewrite rules inside of your .htaccess file.

I suggest you have a look at the rewriteflags, particularly the B flag

ManuelH
  • 846
  • 1
  • 15
  • 25
  • Sorry, I forgot to provide an important information -- there are two environments: test env with nginx and live env on. Just edited the question. – automatix Jun 29 '13 at 00:17
  • Thank you for your answer! Flags? `B (escape backreferences)`? Could you please explain, how it would solve the problem? – automatix Jun 29 '13 at 00:24
  • Taking the wikipedia url above, consider the rule: RewriteRule ^wiki/(.*)$ /script.php?wiki=$1 [B] This would rewrite the following: http://en.wikipedia.org/wiki/Signal_%28electrical_engineering%29 into this: http://en.wikipedia.org/wiki/Signal_(electrical_engineering) – ManuelH Jun 29 '13 at 00:41
  • I've just tried it out: `RewriteRule course/^([0-9]+)-([.()-_a-zA-Z0-9%]+)$ /course/$1-$2 [B]`, but it's not woring. What is wrong here? – automatix Jun 29 '13 at 01:27
0

You should show us your nginx fast_cgi configuration.

They're several way to set the PATH_INFO for PHP, and this is the string containing the path that ZF will have to manage.

One way is:

fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_param PATH_INFO $fastcgi_path_info;

From this post it seems you could also use this way (named captures) to avoid all urlencoding of the PATH_INFO content:

location ~ ^(?<SCRIPT_FILENAME>.+\.php)(?<PATH_INFO>.+)$ {
(...)
fastcgi_param PATH_INFO $PATH_INFO;

So at least you would detect if the problem comes from having too much or not enough urlencoding.

By avoiding urlencoding from the webserver (and by doing the same with apache) you could manage urldecoding of the path in the PHP side. As this time you know it would never be urldecoded, and that you would have to to do it in php -- or maybe you would have to urlencode it -- weel you would have to manage the fact that path may come in both versions.

This would, maybe, be a nice job for a Zend Framework Router. One of the job of the router is to avoid things like .htaccess rewrite rules in apache and manages url in the application, on a stable and webserver-independant way.

First step will be to test the path string and detect if url encoding needs to be done or not. Of course if you send url with a mix of url-encoded and url-decoded characters in the same string things will get a lot more harder, as you will not be able to decide (but it would be the same for a webserver). And in your example you used parenthesis that were not urlencoded in generated encoded url but encoded in wikipedia example, your application will have to choose a policy for the rfc protected characters.

regilero
  • 29,806
  • 6
  • 60
  • 99
  • Thank you for your answer! I'm pretty sure, that it's not a ZF2 problem, since I also had routing problems and resolved them (see [here](http://stackoverflow.com/questions/15634913/uris-with-german-special-characters-dont-work-error-404-in-zend-framework-2) and [here](http://stackoverflow.com/questions/15658354/how-to-set-a-utf8-modifier-for-regex-of-a-regex-route-in-zend-framework-2)). Requests with (unescaped) special chars in the URI don't reach the application. I'm currently having some troubles with my nginx VM. I'll provide my `fast_cgi` configuration, when I resolve the VM problems. – automatix Jul 05 '13 at 08:04
  • Three weeks later... :) I've finally resolve the issues with the VM and just updated my question with the nginx vhost settings. I've tried both solutions out: 1. `fastcgi_split_path_info ^(.+\.php)(/.+)$; fastcgi_param PATH_INFO $fastcgi_path_info;` -- no chages; 2. `location ~ ^(?.+\.php)(?.+)$ {` (instead of `location ~ \.php$ {`) and `fastcgi_param PATH_INFO $PATH_INFO;` -- PHP is not rendered anymore and I can download PHP files. – automatix Jul 23 '13 at 23:51