I'm writing my own little php framework. I want to write everything as semantic as it could be, and I'm stacked.
I've got an url
parsing class
. It parse the whole url (scheme, subdomain, domain, resource and query). Next the router
class decides what to do with this url
. If there are resources corresponding to url
it "renders" it, if not it render 404, if resource is forbidden it renders 403, etc... What is the problem:
Let's say that my site is under: http://en.mysite.com
. Lets say that pages asd
and &*%
does not exist. So I've got 2 url's:
http://en.mysite.com/asd
http://en.mysite.com/&*%($^&#
Of course both sites doesn't exists. But what should the headers look like? I'm predicting that:
http://en.mysite.com/asd // header 404 Page not found
http://en.mysite.com/&*% // header 400 Bad request
However (based on our guru site):
http://stackoverflow.com/<< // header 404
http://stackoverflow.com/&;: // header 404
http://stackoverflow.com/&*%($%5E&# // header 400 (which btw is not styled...)
https://www.google.com/%&*(#$*%&@^ // header 404...
What is the rule? Should every system predict which symbols are ok for url? As for me url should containt only [a-z0-9-_.#!]+
. I'm using slashes as paramters, so I dont need ? = &
. But what is the general rule? Are there any url regex in specification?
BTW: For those who will say put 404 and go drink bear: I probably will :).
But this problem is kind of serious in case of SEO. As 400 is quite not the same as 404 in case of positioning. And it is nice to style 400 page Your own way, and say to someone not "page not found" but "are you trying to inject something into my beautiful url? It is a BAD REQUEST!