1

Our site's links are like this:

https://www.example.com/video.html#11

But I don't know why, sometimes some sites convert our hash like this:

https://www.example.com/video.html%11

So I tried this line, it works but links are dynamic-based, because the ID of the video may change so I need to modify it.

ErrorDocument 404 https://www.example.com/video.html#11

And also, I have the following line in the .htaccess file but it doesn't work:

RewriteRule ^([^\.]+)$ $1.html [NC,L]
MrWhite
  • 43,179
  • 8
  • 60
  • 84
folec94
  • 13
  • 3
  • You can only specify one ErrorDocument, so this won't work for multiple different IDs. You should implement a RewriteRule instead, that externally redirects these to the correct form. `%11` would be a "vertical tab" character, but RewriteRule matches against the already URL-decoded path - so I don't know if you can match that directly (spaces can be escaped with a backslash, but I don't know if that will work for the vertical tab character as well), you might have to use a RewriteCond that checks the original request instead (similar to https://serverfault.com/q/122879) – CBroe Jan 12 '22 at 14:25
  • What range of IDs are you expecting? – MrWhite Jan 15 '22 at 01:45
  • "I have the following line in the `.htaccess` file but it doesn't work" - This is unrelated to the current issue and should be asked as a different question. There's nothing particularly wrong with that rule so you may have a conflict with other directives or it's simply not doing what you think it's doing. (What are _you_ expecting that rule to be doing?) – MrWhite Jan 15 '22 at 01:48

1 Answers1

0
https://www.example.com/video.html%11

You can redirect this (and similar) URLs to replace % with # using something like the following mod_rewrite rule at the top of your .htaccess file:

RewriteCond %{THE_REQUEST} ^GET\s(/video\.html)%(\d+)\s
RewriteRule . %1#%2 [NE,R=302,L]

THE_REQUEST contains the first line of the request headers, is not %-decoded and is not affected by other rewrites.

The %1 and %2 backreferences in the substitution string contain /video.html and the number after the % sign respectively.

The NE (noescape) flag is required to prevent the # sign being URL-encoded (as %23) in the response and being interpreted as part of the URL-path.

HOWEVER, there are further complications with this as it depends on what ID numbers you are expecting after %. %NN (the first 2 digits) are obviously seen as a %-encoded character in the URL. The browser will convert some of these %-encoded characters back into the literal character before making the request, so %NN may not reach your server. Notably, this affects the standard latin characters %61(a) to %86(z), numbers, uppercase letters, etc.

For example, given a request for /video.html%61, the browser will likely convert this to /video.htmla before making the request. In .htaccess these will need to be checked for and converted manually, unless you have access to the server config to create a RewriteMap for looking up the equivalent hex-codes. (The built-in escape() function only %-encodes special characters - like the browser - so it will have no effect here.)

Aside: An additional complication on Windows servers is that any %-encoded characters that are not permitted in Windows filenames (eg. codes 0-31, which notably includes %11 (vertical tab) from your example) will result in a 403 Forbidden response before .htaccess is able to process the request. To resolve this you would need to move this directive to the main server config (or VirtualHost) in order to process the request before it is mapped to the filesystem.

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • Thank you for your clear answer, I appreciate it. But now I have another problem: It's working when I type #21, #11, or similar, but if I type #31, #59, or #3, it gives a bad 400 request and does not convert. I tried that add %3%4... to the related line but nothing has changed. – folec94 Jan 18 '22 at 19:31
  • @folec94 "when I type #21, #11" - Presumably you mean when you type `%21`, `%11`? If you are getting a "400 Bad Request" response then this is likely triggered by other "security" rules defined in your server config - which occurs _before_ `.htaccess` is able to process the request. You can sometimes workaround this by moving the directives from `.htaccess` to the main server config, which is processed much earlier. – MrWhite Jan 18 '22 at 19:48
  • @folec94 Another complication (which I mentioned above) is that `%31` and `%59` map to literal characters `1` and `Y` respectively. Some browsers (notably Google Chrome) will decode these %-encoded characters before making the request. So your server actually receives `/video.html1` and `/video.htmlY` instead of `/video.html%31` and `/video.html%59` respectively. – MrWhite Jan 18 '22 at 19:51
  • @folec94 "I tried that add %3%4... to the related line" - I'm not sure what you mean by this? – MrWhite Jan 18 '22 at 19:55