3

There are a lot of similar questions, but none seem to be the exact fit for me.

I am moving away from a WordPress site to a simple static site. However, I am currently prohibited from removing the WordPress site hosted in the public_html folder completely until everything is proven to be working with the static site.

I have the static site deployed to a sub-sub folder in my public_html folder e.g. /subfolderA/newSiteFolder.

I have updated the .htaccess to redirect to the sub sub folder using the following:

RewriteEngine on
RewriteCond %{REQUEST_URI} !newSiteFolder/ 
RewriteCond %{REQUEST_URI} !subfolderA/newSiteFolder/ 
RewriteRule (.*)$ /subfolderA/newSiteFolder/$1 [L] 

This works fine and shows properly in the address bar when navigating the site by pressing links from within the site, however when navigating to the site from an external link, the subfolders are shown in the address bar.

For example, if the about page is clicked from an external link, it shows as https://example.com/subfolderA/newSiteFolder/about, instead of https://example.com/about.

How can I mask the sub folder names in the address bar when clicked from an external link? Or how best to change my rewrite rules to accomplish this?

MrWhite
  • 43,179
  • 8
  • 60
  • 84
Keisha W
  • 686
  • 6
  • 17
  • There shouldn't be any difference between the external link and the internal link or is the external link actually linking directly to `/subfolderA/newSiteFolder/about` for some reason? Or is the external link linking to HTTP or www vs non-www and a canonical redirect is (erroneously) redirecting the request? Please include the complete contents of your `.htaccess` file. – MrWhite Feb 04 '22 at 17:20
  • _Aside:_ Why do you have two _conditions_ that basically do the same thing at different path depths? Also, these are not anchored so are matching _anywhere_ in the URL-path. – MrWhite Feb 04 '22 at 17:28
  • Hi that is the full contents of the htaccess file. I removed everything else it trying to get this to work. The external link was a non-www, it's linked to `https://example.com/about` does that make a difference? – Keisha W Feb 04 '22 at 21:57
  • And sorry, I am new to using the htaccess file in this way and the conditions added were based on answers I've seen and trying to get it to work for me. Which two conditions do the same thing and how can they be improved? – Keisha W Feb 04 '22 at 21:58
  • The external link also doesn't have a trailing slash. I'm not sure if that affects anything. Linking to `https://example.com/about` shows the subfolders in the URL while linking to `https://example.com/about/` does not. Is there a change I can make to handle this? – Keisha W Feb 04 '22 at 22:29
  • If that's the "full contents of the htaccess file", where are the WordPress directives? Or have they been removed for now? So, there are no other `.htaccess` files in subdirectories? How is a request for `/about` ultimately handled/routed? It's internally rewritten to `/subfolderA/newSiteFolder/about` and then what? So, your canonical URLs (your internal links) include a trailing slash - and work OK? But the external link you mention does not? What is the exact redirect you are seeing? Is this also appending a trailing slash (as well as exposing the subdirectories)? – MrWhite Feb 04 '22 at 22:37
  • Yes I removed all the WordPress directives and there are no .htaccess files in the subdirectories. I don't follow all the questions, but to clarify a link in an external file like this `Click here to see about` takes me to this link with the subfolders shown `https://example.com/subfolderA/newSiteFolder/about/`, which has a trailing slash. While a link in an external file like this `Click here to see about` works fine. – Keisha W Feb 04 '22 at 23:01

1 Answers1

5

I'm assuming that about is actually a physical subdirectory at /subfolderA/newSiteFolder/about and you are intending to serve the DirectoryIndex document (eg. index.html) from that directory.

The "problem" is that when you request a directory without a trailing slash mod_dir attempts to "fix" this by appending a trailing slash via a 301 (permanent) redirect and this is exposing the file-path that has been internally rewritten to.

In other words, when you request /about (no trailing slash), your mod_rewrite directives internally rewrite the request to /subfolderA/newSiteFolder/about, but then mod_dir kicks in and externally redirects the request to /subfolderA/newSiteFolder/about/ to append the trailing slash (which is required).

The canonical URL contains the trailing slash and this is what you are linking to internally. So we need to make sure there is always a trailing slash on the rewritten URL when this maps to a directory. We can do this with a canonical redirect before we rewrite the URL.

RewriteCond %{REQUEST_URI} !newSiteFolder/ 
RewriteCond %{REQUEST_URI} !subfolderA/newSiteFolder/ 
RewriteRule (.*)$ /subfolderA/newSiteFolder/$1 [L]

The first conditon would seem to be superfluous. But also, the regex used here are not anchored so are matching the stated URL anywhere in the requested URL-path.

However, we can't just append the trailing slash to all URLs, since you likely have static resources like CSS, JS and images etc. For any static files we must not force a trailing slash, so we need to handle this with an additional rule. Try the following instead:

# Store the base directory in an environment variable
RewriteRule ^ - [E=BASEDIR:/subfolderA/newSiteFolder/]

# Rewrite the root (homepage) only
RewriteRule ^$ %{ENV:BASEDIR} [L]

# Finish early if we are already in the required base directory
RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1
RewriteRule ^ - [L]

# If the request would map to a directory
#     and it is missing a trailing slash
#     then redirect to append the trailing slash
RewriteCond %{REQUEST_URI} !\.\w{2,4}$
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASEDIR}$1 -d
RewriteRule ^(.+[^/])$ /$1/ [R=301,L]

# Rewrite everything to the base directory
RewriteRule (.+) %{ENV:BASEDIR}$1 [L]

Explanation of the above directives

I have chosen to store the "base directory" (ie. /subfolderA/newSiteFolder/) in an environment variable BASEDIR using the first rule to save repetition of the base file-path throughout the file.

RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1

This condition checks whether the requested URL (including the rewritten URL) is already inside the base directory being rewritten to. The @ character is just an arbitrary character that does not appear in the URL-path, it carries no special meaning in the regex, other than delimiting the base directory (BASEDIR) from the requested URL (REQUEST_URI). \1 is an internal backreference to check whether the requested URL starts with the base directory.

RewriteCond %{REQUEST_URI} !\.\w{2,4}$
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASEDIR}$1 -d
RewriteRule ^(.+[^/])$ /$1/ [R=301,L]

The first condition excludes any request that ends in what looks-like a file extension (ie. a dot followed by between 2 and 4 characters), so we can avoid the more expensive directory check (that follows). This does assume that you don't have physical directories that end with what looks-like a "file extension".

The second condition tests whether the requested URL (eg. /about) exists as a directory inside the directory being rewritten to.

The regex ^(.+[^/])$ matches (and captures) any URL-path that does not already end in a slash.

NB: You need to make sure you have cleared your browser cache before testing since the earlier erroneous redirect to append the trailing slash (that also exposed the file-path) was a 301 permanent redirect and will likely have been cached persistently by the browser.


Prevent direct access to the "hidden" subdirectory

Is there a way to also fix the URL for a user who was previously navigated to mydomain/subfolderA/newSiteFolder/about from the external link and saved the link with the subfolders, and is now using that link directly?

You can prevent direct access to this "hidden" subdirectory and redirect the user back to the "canonical" URL with something like the following. This should go as the 3rd rule in the above block, after the "Rewrite the root ..." rule.

# Redirect direct requests to the subdirectory back to root
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1(.*)
RewriteRule ^ /%2 [R=301,L]

Importantly, the first condition that checks against the REDIRECT_STATUS env var excludes rewritten requests by the later rewrite, so this rule only affects direct requests from the client.

%2 is a backreference to the 2nd captured group in the preceding CondPattern, ie. everything in the URL-path after the BASEDIR.

HOWEVER, if the user has previously been erroneously redirected to the subdirectory then this redirect will have likely been cached by the browser, so the above redirect to remove (undo) the subdirectory may result in a redirect-loop for these users unfortunately until they clear their browser cache. (This redirect-loop might prompt them to try and clear their browser cache to resolve the issue; although maybe not.)

You could perhaps redirect back to a URL that contains an innocuous query string. This might be enough to prevent a redirect loop for those users that have the erroneous redirect cached (since it's not a URL in their cache), but it does leave a superfluous query string hanging on the URL. For example, change the above RewriteRule directive:

:
RewriteRule ^ /%2?noredirect [R=301,L]

noredirect is just any query string to differentiate from the cached URL/redirect.

NB: Test first with a 302 (temporary) redirect to avoid further/potential caching issues.

Summary

RewriteEngine On

# Store the base directory in an environment variable
RewriteRule ^ - [E=BASEDIR:/subfolderA/newSiteFolder/]

# Rewrite the root (homepage) only
RewriteRule ^$ %{ENV:BASEDIR} [L]

# Redirect direct requests to the subdirectory back to root
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1(.*)
RewriteRule ^ /%2 [R=301,L]

# Finish early if we are already in the required base directory
RewriteCond %{ENV:BASEDIR}@%{REQUEST_URI} ^([^@]+)@\1
RewriteRule ^ - [L]

# If the request would map to a directory
#     and it is missing a trailing slash
#     then redirect to append the trailing slash
RewriteCond %{REQUEST_URI} !\.\w{2,4}$
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASEDIR}$1 -d
RewriteRule ^(.+[^/])$ /$1/ [R=301,L]

# Rewrite everything to the base directory
RewriteRule (.+) %{ENV:BASEDIR}$1 [L]
MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • Great! This seems to be working. Thanks so much. I did not have a clear understanding of exactly what the conditions were doing, so your explanation cleared that up. Is there a way to also fix the URL for a user who was previously navigated to https://mydomain/subfolderA/newSiteFolder/about from the external link and saved the link with the subfolders, and is now using that link directly? – Keisha W Feb 05 '22 at 01:37
  • @KeishaW Yes, you can redirect direct requests to the subdirectory back to the canonical URL - to prevent direct access to the subdirectory. However, for those users that have cached the redirect to the subdirectory, this may create a redirect loop, until they clear their browser cache. I've updated my answer. – MrWhite Feb 05 '22 at 02:31
  • 1
    Thanks again. I've added the update as the third rule and it is working. – Keisha W Feb 05 '22 at 03:04