2

I want to get a list of cities, where each city name is linked and refers a page for this city:

enter image description here

The links (created in the view script) look like this:

http://project.loc/catalog/Berlin (in the HTML source code url-encoded: Berlin)
http://project.loc/catalog/Erlangen (in the HTML source code url-encoded: Erlangen)
http://project.loc/catalog/Nürnberg (in the HTML source code url-encoded: N%C3%BCrnberg)

"Berlin", "Erlangen" etc. work, but if the city name contains a german special character (ä, ö, ü, Ä, Ö, Ü, or ß) like "Nürnberg", a 404 error occurs:

A 404 error occurred Page not found. The requested URL could not be matched by routing. No Exception available

Why? And how to get this working?

Thanks in advance!

EDIT:

My router settings:

'router' => array(
    'routes' => array(
        'catalog' => array(
            'type'  => 'literal',
            'options' => array(
                'route' => '/catalog',
                'defaults' => array(
                    'controller' => 'Catalog\Controller\Catalog',
                    'action'     => 'list-cities',
                ),
            ),
            'may_terminate' => true,
            'child_routes' => array(
                'city' => array(
                    'type'  => 'segment',
                    'options' => array(
                        'route' => '/:city',
                        'constraints' => array(
                            'city'  => '[a-zA-ZäöüÄÖÜß0-9_-]*',
                        ),
                        'defaults' => array(
                            'controller' => 'Catalog\Controller\Catalog',
                            'action'     => 'list-sports',
                        ),
                    ),
                    'may_terminate' => true,
                    'child_routes' => array(
                    // ...
                    ),
                ),
            ),
        ),
    ),
),
automatix
  • 14,018
  • 26
  • 105
  • 230
  • Isn’t it obvious, that this does not match the `constraints` you’ve defined? – CBroe Mar 26 '13 at 11:01
  • Yes, you are right. I've just edited the router settings. It's still not working. – automatix Mar 26 '13 at 11:04
  • You showed what the URLs “look” like – but did you use proper URL encoding for this special characters? – CBroe Mar 26 '13 at 11:06
  • No, I didn't. I'm set the cities names to the URI so how they are comming from the database (`$city->name` and not `urlencode($city->name)`). – automatix Mar 26 '13 at 11:13
  • If you don’t urlencode special characters in URLs yourself, then you leave that up to the client – and with that you can easily run into problems with the character encoding used. – CBroe Mar 26 '13 at 11:15
  • If I would use `urlencode($city->name)`, I would get URIs like `/catalog/N%25C3%25BCrnberg` instead of `/catalog/Nürnberg`. – automatix Mar 26 '13 at 15:37
  • No, `%25C3%25BC` would be `ü` url-encoded _twice_ – `%C3%BC` would be correct for an `ü` in UTF-8. And modern browsers will still _display_ this as `ü` in the status bar/address bar. If you _don’t_ url-encode special chars properly, you might run into problems with a browser assuming a different character encoding than UTF-8 and encode it as f.e. an ISO-8859-1 `ü`, which would be just `%FC` … and then you will _really_ run into problems with your routing. – CBroe Mar 26 '13 at 15:48
  • I've taken a look into the HTML code. Actually, the URIs is url-encoded (`/catalog/N%C3%BCrnberg`), ZF does it by default. I didn't know it and provided you with wrong information, sorry. Also, the URI is url-encoded. – automatix Mar 26 '13 at 16:09

1 Answers1

2

You need to change your constraints, you can use a regular expression which will match UTF8 characters, something like this:

'/[\p{L}]+/u'

Notice the /u modifier (unicode).

EDIT:

The problem is resolved.

Explanation:

The RegEx Route maches the URIs with preg_match(...) (line 116 or 118 of Zend\Mvc\Router\Http\Regex). In order to mach a string with "special chars" (128+) one must pass the pattern modifier u to preg_match(...). Like this:

$thisRegex = '/catalog/(?<city>[\p{L}]*)';
$regexStr = '(^' . $thisRegex . '$)u'; // <-- here
$path = '/catalog/Nürnberg';
$matches = array();
preg_match($regexStr, $path, $matches);

And since RegEx Route passes a url-enccoded string to preg_match(...), it's furthermode needed to decode the string first:

$thisRegex = '/catalog/(?<city>[\p{L}]*)';
$regexStr = '(^' . $thisRegex . '$)u';
$path = rawurldecode('/catalog/N%C3%BCrnberg');
$matches = array();
preg_match($regexStr, $path, $matches);

These two steps are not provided in the RegEx Route, so that preg_match(...) gets a steing like '/catalog/N%C3%BCrnberg' and tries to mach it to a regex like '/catalog/(?<city>[\\p{L}]*)/u'

The solution is to use a custom RegEx Route. Here is an example.

Community
  • 1
  • 1
Andrew
  • 12,617
  • 1
  • 34
  • 48
  • Thank you! That doesn't work for me, but maybe I'm using your regex wrongly. I've replaced `'city' => '[a-zA-ZäöüÄÖÜß0-9_-]*',` with `'city' => '/^\p{L}[\p{L} _.-]+$/u',` – automatix Mar 26 '13 at 15:35
  • maybe that was a poor example, it was just to ullustrate the /u flag. try '/[\p{L}]+/u' – Andrew Mar 26 '13 at 15:39
  • Its finally working! Thank you for so important hint at the modifier! – automatix Mar 27 '13 at 17:36
  • I've already posted the explanation and a link to the solution as completion to your answer. What do you mean? – automatix Apr 02 '13 at 09:31
  • I added this to my modifier: /[\p{L}]+/u but not working still. Automatix, can you please share what you have used? –  Dec 07 '15 at 09:49