53
http://example.com/something/somewhere//somehow/script.js

Does the double slash break anything on the server side? I have a script that parses URLs and i was wondering if it would break anything (or change the path) if i replaced multiple slashes with a single slash. Especially on the server side, some frameworks like CodeIgniter and Joomla use segmented url schemes and routing. I would just want to know if it breaks anything.

Joseph
  • 117,725
  • 30
  • 181
  • 234

8 Answers8

57

HTTP RFC 2396 defines path separator to be single slash.

However, unless you're using some kind of URL rewriting (in which case the rewriting rules may be affected by the number of slashes), the uri maps to a path on disk, but in (most?) modern operating systems (Linux/Unix, Windows), multiple path separators in a row do not have any special meaning, so /path/to/foo and /path//to////foo would eventually map to the same file.

An additional thing that might be affected is caching. Since both your browser and the server cache individual pages (according to their caching settings), requesting same file multiple times via slightly different URIs might affect the caching (depending on server and client implementation).

poncha
  • 7,726
  • 2
  • 34
  • 38
  • 1
    You have to look at section 3.3 of the document you quoted (or RFC3986 which obsoletes it, but agrees on discussed behavior here), which specified through ABNF how `path_segments` consists of at least one `segment` token, which itself may be of empty length. This means that sequences of characters like `//` are perfectly valid in URIs. – Armen Michaeli Jul 27 '16 at 21:42
  • 2
    @amn It is valid, no problem here. But the question was whether or not it can break anything. And it might - if You use URL rewriting (for instance) – poncha Jul 28 '16 at 06:13
  • 27
    This is a great answer! Shame it's a duplicate of https:///stackoverflow.com////////a/////10161264/////6618577 though... – Aric Aug 02 '17 at 15:26
  • 3
    Re "*unless you're using some kind of URL rewriting*", It also matters for relative URLs. `http://host/a/b/c/d + ../../e = http://host/a/e`, while `http://host/a/b/c//d + ../../e = http://host/a/b/e` – ikegami Oct 08 '18 at 06:15
  • @ikegami true ;) nice catch – poncha Nov 01 '18 at 10:11
18

The correct answer to this question is it depends upon the implementation of the server!

Preface: Double-slash is syntactically valid according to RFC 2396, which defines URL path syntax. As amn explains, it therefore implies an empty URI segment. Note however that RFC 2396 only defines the syntax, not semantics of paths, including empty path segments, so it is up to your server to decide the semantics of the empty path.

You didn't mention the server software stack you're using, perhaps you're even rolling your own? So please use your imagination as to what the semantics could be!

Practically, I would like to point out some everyday semantic-related reasons which mean you should avoid double slashes even though they are syntactically valid:

  1. When not everyone assumes that empty should be valid, it can cause bugs! And even though your server technology of today might be compatible with it, your server technology of tomorrow or the next version of your server technology of today might not be. (Example: ASP.NET MVC Web API library throws an error when you try to specify a route template with a double slash.)

  2. Some servers might interpret // as indicating the root. This can become a directory traversal bug - and then usually it is a security bug (look up 'directory traversal vulnerability').

  3. Because it is sometimes a bug, and a security bug, defensively designed server stacks and firewalls will assume the substring '//', in any incoming request is a possible attempt to exploit a bug, and therefore they will block it by returning 403 Forbidden, 404 Not Found, or 400 Bad Request - without ever actually further processing the URI or request.

Tim Lovell-Smith
  • 15,310
  • 14
  • 76
  • 93
15

URLs don't have to map to filesystem paths. So even if // in a filesystem path is equivalent to /, you can't guarantee the same is true for all URLs.

RedGrittyBrick
  • 3,827
  • 1
  • 30
  • 51
2

Consider the declaration of the relevant path-absolute non-terminal in "RFC3986: Uniform Resource Identifier (URI): Generic Syntax" (specified, as is typical, in ABNF syntax):

path-absolute = "/" [ segment-nz *( "/" segment ) ]

Then consider the segment declaration a few lines further down in the same document:

segment       = *pchar

If you can read ABNF, the asterisk (*) specifies that the following element pchar may be repeated multiple times to make up a segment, including zero times. Learning this and re-reading the path-absolute declaration above, you can see that a potentially empty segment imples that the second "/" may repeat indefinitely, hence allowing valid combinations like ////// (arbitrary length of at least one /) as part of path-absolute (which itself is used in specifying the rule describing a URI).

As all URLs are URIs we can conclude that yes, URLs are allowed multiple consecutive forward slashes, per quoted RFC.

But it's not like everyone follows or implements URI parsers per specification, so I am fairly sure there are non-compliant URI/URL parsers and all kinds of software that stacks on top of these where such corner cases break larger systems.

Community
  • 1
  • 1
Armen Michaeli
  • 8,625
  • 8
  • 58
  • 95
  • 3
    All your answer says is that `http://host/a////b` is a valid URI, but that's not what the OP asked. The fact that `http://host/a////b` is valid doesn't make it equivalent to `http://host/a/b`. In fact, the very RFC you quote says they *aren't* equivalent. – ikegami Oct 08 '18 at 06:30
  • The question is not about whether the two URLs you quoted are equivalent. The question asks whether URLs with multiple forward slashes break anything, which I answered with basically "in practice, they might, but in theory they shouldn't as multiple forward slashes are valid with respect to the canonical URL specification". – Armen Michaeli Oct 10 '18 at 11:49
  • 3
    Again, the fact that it's a valid uri is irrelevant. http://foo/ is also a valid uri, but it will sure break things if you use it instead of http://stackoverflow.com. Since all your answer does is show that the uri is valid, it doesn't answer the question – ikegami Oct 11 '18 at 00:27
1

One thing you may want to consider is that it might affect your page indexing in a search engine. According to this web page,

A URL with the same path repeated 3 times will not be indexed in Google

The example they use is:

example.com/path/path/path/

I haven't confirmed this would also be true if you used example.com///, but I would certainly want to find out if SEO optimization was critical for my website.

They mention that "This is because Google thinks it has hit a URL trap." If anyone else knows the answer for sure, please add a comment to this answer; otherwise, I thought it relevant to include this case for consideration.

Sablefoste
  • 4,032
  • 3
  • 37
  • 58
1

Yes, it can most definitely break things.

The spec considers http://host/pages/foo.html and http://host/pages//foo.html to be different URIs, and servers are free to assign different meanings to them. However, most servers will treat paths /pages/foo.html and /pages//foo.html identically (because the underlying file system does too). But even when dealing with such servers, it's easily possible for extra slash to break things. Consider the situation where a relative URI is returned by the server.

http://host/pages/foo.html  + ../images/foo.png = http://host/images/foo.png
http://host/pages//foo.html + ../images/foo.png = http://host/pages/images/foo.png

Let me explain what that means. Say your server returns an HTML document that contains the following:

<img src="../images/foo.png">

If your browser obtained that page using

http://host/pages/foo.html          # Path has 2 segments: "pages" and "foo.html"

your browser will attempt to load

http://host/images/foo.png          # ok

However, if your browser obtained that page using

http://host/pages//foo.html         # Path has 3 segments: "pages", "" and "foo.html"

you'll probably get the same page (because the server probably doesn't distinguish /pages//foo.html from /pages/foo.html), but your browser will erroneously try to load

http://host/pages/images/foo.png    # XXX
ikegami
  • 367,544
  • 15
  • 269
  • 518
0

You may be surprised for example when building links for resources in your app.

<script src="mysite.com/resources/jquery//../angular/script.js"></script>

will not resolve to mysite.com/resources/angular/script.js but to mysite.com/resources/jquery/angular/script.js what you probably didn't want

Double slashes are evil, try to avoid them.

lukyer
  • 7,595
  • 3
  • 37
  • 31
-2

Your question is "does it break anything". In terms of the URL specification, extra slashes are allowed. Don't read the RFC, here is a quick experiment you can try to see if your browser silently mangles the URL:

echo '<?= $_SERVER['REQUEST_URI'];' > tmp.php                                   
php -S localhost:4000 tmp.php

I tested macOS 10.14 (18A391) with Safari 12.0 (14606.1.36.1.9) and Chrome 69.0.3497.100 and both get the result:

/hello//world

This indicated that using an extra slash is visible to the web application.

Certain use cases will be broken when using a double slash. This includes URL redirects/routing that are expecting a single-slashed URL or other CGI applications that are analyzing the URI directly.

But for normal cases of serving static content, such as your example, this will still get the correct content. But the client will get a cache miss against the same content accessed with different slashes.

William Entriken
  • 37,208
  • 23
  • 149
  • 195
  • Clarified answer to specific what is and is not broken. – William Entriken Oct 08 '18 at 13:45
  • Re "*this will still get the correct content*", [No, it won't](https://stackoverflow.com/a/52696484/589924) if the served page contains relative urls to scripts, images, etc – ikegami Oct 08 '18 at 20:26
  • Qualifier "normal cases of serving static content, such as your example" excludes the special case of a double-slash with a `..` in your example. – William Entriken Oct 10 '18 at 15:28
  • There's nothing special about static pages with relative references within; they are quite common. You could be reading one right now for all you know – ikegami Oct 11 '18 at 00:22
  • 1
    Ok. Who said that referencing `../xyz` against `http://url/a//b` to get `http://url/a/xyz` was not the intended behavior? – William Entriken Oct 12 '18 at 04:15
  • That doesn't change anything. If `http://url/a//b` is the expected URL, then adding slashes (e.g. `http://url/a///b`) or removing them (e.g. `http://url/a/b`) could break things – ikegami Oct 12 '18 at 08:08
  • Summary: a URI with runs of slashes is different than a URL with those slash runs collapsed (`s|/+|/|g`). It is likely your server will serve the same file. You are responsible for the affect on relative URLs. – William Entriken Oct 15 '18 at 01:16
  • That's a horrible summary. The summary, like your answer, doesn't answer the question! Summary: a URI with runs of slashes is different than a URL with those slash runs collapsed (`s|/+|/|g`). It is likely your server will serve the different files as part of loading the referenced document (because it will likely load the right page but break relative urls within), thus breaking things. – ikegami Oct 15 '18 at 01:34