0

The setup

I have some zip archives on my server. The links to those files are peppered about various blogs, YouTube channels, and so on.

I'm moving the files to Google Drive, and since I want to avoid changing the links which are all over the internet, I was thinking of doing automatic redirection to the new Google Drive links.

I've setup my .htaccess like this

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*/)?([^/]+\.zip)$ /mydir/myscript.php?dir=$1&file=$2 [L,QSA]

where myscript.php is parsing the directory name and filename, and passing them on to Google Drive API. The API then returns the download link for the requested file, and redirects the user to it, thereby initiating the download.

The problem

This is working fine, as long as the file actually exists on my server. - After lookin further into this, it seems that this is incorrect, as commented by Cbroe

Once I delete the file, I'm getting 403 Forbidden.

How can I get my .htaccess to work as I want it to? I want it to redirect the requests to myscript.php, which contains the logic.

EDIT As requested in the comments, I'm providing additional details.

The directory setup is like this:

site root
   dir1
   dir2
   files1  <=== this one has the archives
   files2  <=== this one has the archives
   files3  <=== this one has the archives
   mydir   <=== this one handles Google Drive logic

Contents of files1-3:

files1
   .htaccess
   archive_1.zip
   archive_2.zip
   ...
   archive_n.zip
   index.php

EDIT 2 Furhter experiments have shown me that the problem shows up if the file name has spaces.

For example:

https://example.com/files1/archive_1.zip <=== works
https://example.com/files1/archive 1.zip <=== doesn't work
FiddlingAway
  • 1,598
  • 3
  • 14
  • 30
  • "*Once I delete it*" - to be clear, you mean the zip file? Is it possible the rule is working, and the 403 response is generated by your `myscript.php`? – Don't Panic May 12 '23 at 07:47
  • _"This is working fine, as long as the file actually exists on my server."_ - that makes little sense, with the code you have shown us - because those RewriteConds would make the rule apply only if the request URI could _not_ be matched to any existing file or folder in the first place. You'll need to give us some more details here - where is that .htaccess file located, what other rewriting might be going on in addition to this, etc. – CBroe May 12 '23 at 07:52
  • @Don'tPanic I think that's not the case, if the address bar is anything to go by - I'm still on the originally requested link, and not at the expected `mydir/myscript.php` – FiddlingAway May 12 '23 at 09:26
  • @CBroe After looking at the rules again (the file and directory flags), and rechecking the actual download link of the downloaded file, I see that you are correct. I've edited my question with that note, and additional details (directory structure). – FiddlingAway May 12 '23 at 09:55
  • _"if the address bar is anything to go by"_ - it isn't, when you are only doing an internal _rewrite_. The browser address bar would only change, if you made a _redirect_, to actually make the browser request a different URL. – CBroe May 12 '23 at 09:57
  • _"Once I delete the file, I'm getting 403 Forbidden."_ - it sounds like your rewriting attempt did not affect these requests in the first place. But then you should rather be getting a 404 than a 403, if you simply request a file that is not there. Do you have any other rewriting configured on the site root level already perhaps? – CBroe May 12 '23 at 10:02
  • What URL(s) are you requesting? What are you expecting this to be rewritten to? Why is your `.htaccess` fiel seemingly inside the `/files1` subdirectory? (That certainly looks like an error or typo since the `dir` param will always be _empty_ - if that is the `.htaccess` you are editing?) Do you have multiple `.htaccess` files? What other directives do you have? – MrWhite May 12 '23 at 11:11
  • @CBroe At root level, other than the caching rules, there's this: `RewriteCond %{SERVER_PORT} 80 RewriteRule ^(.*)$ https://www.example.com/$1 [R,L]` (instead of `example.com`, it's the URL of my website. – FiddlingAway May 13 '23 at 06:17
  • @MrWhite The requests which I'm redirecting are in this form: `https://www.example.com/files1/archive_1.zip` or `https://www.example.com/files2/archive_xy.zip`. I have an `.htaccess` in each of the three subdirectores mentioned in the question, since I didn't want to change the root-level `.htaccess`. I'm picking up the directory and the file from the request, and sending them to `mydir/myscript.php`. The script is picking them up, checking them up, and then requesting a Google Drive download link, based on what it received from the redirection earlier on. – FiddlingAway May 13 '23 at 06:21
  • 1
    @CBroe I have resolved the issue - spaces in the filename. I've added the B flag, with space specified as something to be escaped. Thank you for all the comments, they helped quite a bit. – FiddlingAway May 13 '23 at 10:27
  • @MrWhite I've solved the problem - the spaces in the filename were not being encoded, and redirection was failing. – FiddlingAway May 13 '23 at 10:27

2 Answers2

1

After checking the rest of the server (Apache logs, specifically), I have come across this:

[Sat May 13 11:57:17.679750 2023] [rewrite:error] [pid 484358] [client <my_ip>]
AH10411: Rewritten query string contains control characters or spaces

Sure enough, some of the files - the ones I kept testing for redirection - had spaces (archive_1 .zip, or archive 1.zip for example), and I missed that upon my initial checkup.

After checking the offical documentation for RewriteRule flags, I made some changes in my .htaccess:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule "^([^/]+\.\w+)$" "/mydir/myscript.php?dir=files1&file=$1" "[L,QSA,B= ]"

To quote the documentation on why this was needed:

Given a search term of 'x & y/z', a browser will encode it as 'x%20%26%20y%2Fz', making the request 'search/x%20%26%20y%2Fz'. Without the B flag, this rewrite rule will map to 'search.php?term=x & y/z', which isn't a valid URL, and so would be encoded as search.php?term=x%20&y%2Fz=, which is not what was intended.

Since I only want to escape spaces (I've checked for other special characters in the filenames - grep to file, parse the file programmatically - and there are none), I've adapted the B flag to look like this:

B= 

And since space is the character I'm escaping, I needed to enclose each part of the rule within spaces (as stated in the documentation):

In 2.4.26 and later, you can limit the escaping to specific characters in backreferences by listing them: [B=#?;]. Note: The space character can be used in the list of characters to escape, but you must quote the entire third argument of RewriteRule and the space must not be the last character in the list.

FiddlingAway
  • 1,598
  • 3
  • 14
  • 30
  • "why this was needed" - as to why this results in a 403 is not covered in the docs. The 403 response is a very recent change in Apache 2.4.56 (as mentioned in [my answer](https://stackoverflow.com/a/76242269/369434)). With your updated rule, the `file` URL parameter is always going to be empty (the `dir` param will contain the filename). Whereas with the rule in your question it's the other way round, `dir` is always empty (as mentioned in my answer). – MrWhite May 13 '23 at 11:19
  • Regarding the `B` flag, the docs state that the _space_ cannot be the last character, which would imply you cannot encode _only_ a space, so I would think minimum would be something like `B= ?` (_space_ and `?`). Although I'm assuming `B= ` only would seem to be working OK without error? – MrWhite May 13 '23 at 11:22
  • You could also avoid these Apache issues by simply rewriting the request to your PHP script, without any URL params and parsing the request in PHP (which you appear to be doing already to a certain extent). I've updated [my answer](https://stackoverflow.com/a/76242269/369434). – MrWhite May 13 '23 at 11:50
  • 1
    @MrWhite Thank you for all the comments. The `B` flag indeed works with space alone - the docs say that it would have to be quoted, which is what I ended up doing. And I see that I've pasted the wrong (commented) line in my answer - it should be as you say `file=$1`, since I've replaced the `dir=$1` with `dir=files1` (I already know the directory, I only need to grab the file). – FiddlingAway May 16 '23 at 21:36
1

Once I delete the file, I'm getting 403 Forbidden.

From what you've posted, it looks like your /mydir/myscript.php should at least be getting called (so the rewrite is "working" to some extent), but the 403 Forbidden response would seem to be occurring later. (?)

You've not stated what you are expecting the URL to be rewritten to, but if the .htaccess file in question is repeated in each subdirectory (not ideal) then the dir URL parameter is always going to be empty, which I'm sure is not the intention. This is because the URL-path matched by the RewriteRule pattern is relative to the directory that contains the .htaccess file, not the root. To match against the full URL-path (when in a subdirectory) you need the REQUEST_URI server variable. But note that REQUEST_URI also contains a slash prefix.

EDIT 2 Furhter experiments have shown me that the problem shows up if the file name has spaces.

Ah, this is a different issue, related to a "bug" (or supposed "security fix") introduced in Apache 2.4.56, whereby any unencoded "special" characters (eg. spaces) passed in the rewritten query string are immediately rejected (with a 403 Forbidden response). To resolve this you need to use the B flag to re-encode the captured backreference.

HOWEVER, if you captured the value from the REQUEST_URI server variable then you perhaps don't need to since this may already be URL-encoded. (I can't just remember whether the REQUEST_URI Apache server variable is %-encoded or not?) Failing that, match against THE_REQUEST instead (which contains the first line of the request headers), which is certainly URL-encoded already (as sent from the client).

Try the following instead:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} ^/([^/]+/)([^/]+)$
RewriteRule \.zip$ /mydir/myscript.php?dir=%1&file=%2 [B,L,QSA]

As mentioned above, the B flag may or may not be required here.

The %1 backreference captures the directory component of the requested URL-path from the preceding condition. This includes the trailing slash (as it would have done with your previous regex). And the %2 backreference contains the filename only.

The RewriteRule pattern simply validates that the request ends with .zip.

No need to check that the request does not map to a directory, unless you also have directories that are named with a .zip extension - unlikely.


UPDATE:

As an aside, you could avoid these "Apache" issues and simply pass the request to myscript.php, with no URL params and parse the requested URL from the PHP superglobal $_SERVER['REQUEST_URI'] instead.

For example:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule \.zip$ /mydir/myscript.php [L]

The PHP superglobal $_SERVER['REQUEST_URI'] then contains a string of the form /files/archive%201.zip that you would then parse (in PHP) to extract the neccessary parts.

No need for the QSA flag here, since any additional query string is passed through by default.

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • `You've not stated what you are expecting the URL to be rewritten to` - Ah, sorry, I thought I'd explained that. If the request is made to `https://www.example.com/files1/archive_01.zip` it should be rewritten to `https://www.example.com/mydir/myscript.php?dir=files1&file=archive_01.zip`. The script `myscript.php` then parses the `$_GET` parameters, checks the validity, Google Drive availability, etc, and force downloads the files. – FiddlingAway May 16 '23 at 21:45