0

I noticed that most PHP frameworks routes (all) does not use the written urls dot and the php-standalone-server also does not use dots (note that the php-standalone-server not need mod_rewrite, it usually works). This is a pattern, I avoid dots rewritten urls?

Consider reading the following topics before:

  1. If I running php-standalone-server in this struct folder:

    project
    │   index.php
    │   test.php
    │
    └── blog1
        └── index.html
    

    And access http://localhost:8000/test.php, it show response from ./test.php file. If I access http://localhost:8000/blog1 it show contents from blog1/index.html. This is because this is how the phpstand provides native support for rewrites URLs, but this is not the issue.

    But If I access http://localhost:8000/blog, it show response from ./index.php.

  2. The own stackoverflow is an example that replaces the dots by hyphens, an example is this question:

    • Title: Why not use the dot (“.”) in url rewrite?
    • Url: http://stackoverflow.com/questions/29977809/why-not-use-the-dot-in-url-rewrite
  3. An example of php framework is CodeIgniter-3, is don't allow dots.

The question:

Understand this, I wonder if I should allow dots (".") in or not? Do not use dots is a "standard"?

Protomen
  • 9,471
  • 9
  • 57
  • 124
  • `.` in regex represents *any character* so it has to be escaped to get literal `.` – developerwjk Apr 30 '15 at 21:12
  • 1
    There are books upon books written about seo (or search-engine optimized) url's. The most likely conclusion I can come up with is that dots in url's make it look like that everything after the dot is an extension. We don't want that. An answer on this question would likely be largely opinion-based, so I am voting to close. – Sumurai8 Apr 30 '15 at 21:53

1 Answers1

3

Finally found the reason for this "polemic" when I searched (googled) the term rfc dot path

The problem with the dot . in URLs

It's okay to use dots in url (even url-rewritten) such as:

http://example/project/hello-new-world

or assuming that we will create a url false as:

http://example/project/index.php/hello-new-world.html

The problem occurs is when to use so:

http://example/project/test./

To the server /project/test./ and /project/test/ are the same thing, but it is visible that are not.

Note that the problem does NOT occur if you do this /project/.test/, as there are files that start with dot only, like .htaccess

The reason the URLs rewritten not use dots to prevent this or facilitate the canonicalization of URLs (URL normalization).

A clearer example of the problem, create a file on your physical folder on localhost:

/var/www/images/test.jpg

Go to http: //localhost/images/test.jpg and then try to access all of these:

  • http://localhost/images/test.jpg.
  • http://localhost/images/test.jpg...
  • http://localhost/images/test.jpg....
  • http://localhost/images/test.jpg.....
  • http://localhost/images/test.jpg......
  • http://localhost/images/test.jpg.......

All URLs are delivered to the client (web-browser for example) as image test.jpg.

URL normalization (or URL canonicalization)

Normalization of URL (or URL canonicalization) is the process by which URLs are altered and standardized in a consistent manner. The objective of the standardization process is to turn a URL into a standard URL or canonical so you can determine whether two different URLs can be syntactically equivalent.

Search engines use standardization URL in order to attach importance to web pages and reduce indexing of duplicate pages. Crawlers perform normalization URL in order to avoid tracking the same resource more than once.

Types of standardization (the following normalization are described by RFC 3986):

  • Removal of the directory index. Default directory indexes are generally not required in URLs:

    http://www.example.com/a/index.htmlhttp://www.example.com/a/

  • Replacing IP domain name. Verify that the IP address maps to a canonical domain name:

    http://208.77.188.166/http://www.example.com/ (something that helps it is the header Host: domain)

  • Removing duplicate cutting paths which include two adjacent bars can be converted to a:

    http://www.example.com/foo//bar.htmlhttp://www.example.com/foo/bar.html

  • Removing or adding www as the first domain label. Both urls often dot to as same pages:

    http://www.example.com/http://example.com/

  • Removing the ? when the query is empty. When the query is empty, there may be no need for ?:

    http://www.example.com/display?http://www.example.com/display

  • Add / to the directories:

    http://www.example.com/alicehttp://www.example.com/alice/ (usually the server with Apache and Nginx already do redirection, if a real folder).

    However, there is no way to know if a URL path component is a directory or not. RFC 3986 note that if the URL redirects to the previous URL example, then this is an indication that they are equivalent.

  • Removing segments dots (dot-segments). The segment .. and . It can be removed from a URL according to the algorithm described in RFC 3986:

    http://www.example.com/../a/b/../c/./d.htmlhttp://www.example.com/a/c/d.html

    However, if a removed .. component, e.g. b/.., is a symlink to a directory with a different parent, eliding b/.. will result in a different path and URL. In rare cases depending on the web server, this may even be true for the root directory (e.g. //www.example.com/.. may not be equivalent to //www.example.com/. (this is the likely reason to avoid .)

Then you ask me: I must then avoid the dots in my rewrites URLs?

I say it is a solution, but not the only, if you are using mod_rewrite is probably using a language like PHP by example and through this language you can detect if the URL has dots at the end, eg.:

<IfModule mod_rewrite.c>
    RewriteEngine On

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d

    RewriteRule ^([a-zA-Z0-9\-\/.]+)$ index.php/$1 [QSA,L]
</IfModule>

This RewriteRule generates the variable $_SERVER['PATH_INFO'] and you can compare is variable with the variable $_SERVER['REQUEST_URI'] both will be different. Or you can just use REQUEST_URI combined with rtrim to check and make a permanent redirect, eg.:

<?php
$req = rtrim($_SERVER['REQUEST_URI'], '/');//Remove / of the end of URL.

if ($req !== rtrim($req, '.')) {
    header('X-PHP-Response-Code: 301', true, 301);
}

Sources:

Protomen
  • 9,471
  • 9
  • 57
  • 124