2

I read through a lot of similar topics, but didn't find the right answer, so please help me.

Let's say the user types in a non-existing sub directory to my webpage:

www.example.com/subpage-1

What I want to achieve:

I want my mainpage (www.example.com - actually with hidden index.html) to open, but keep the URL unchanged with the non-existing subpage (www.example.com/subpage-1).

The reason why I need it: I have the website only with the main site (index.html), and everything is controlled via JavaScript dynamically.

I want to introduce sub pages - but I want to use only my main index.html site with JS to control it. (Just like a single page application.)

So when the user enters the URL www.example.com/subpage-1, my main site opens, but since the URL is kept unchanged, the JS script can check the URL, see, that subpage-1 is requested, and generate the right content for it (if subpage-1 is supported, of course).

Also that could be a SEO-friendly solution, since I could provide Google a sitemap with the available subpages as well - however everything would be controlled via the same index.html and JS.

How can I achieve it?

What I found so far:

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)$ index.html?url=$1 [QSA,L]

My problem with this is that it opens the main page every time, I can't find the query (?url=) anywhere, so I can't use it.


Also a problem what I don't know how to handle: Let's say the user enters www.example.com/subpage-1 and it's working fine since my JS script handles "subpage-1".

But what if

www.example.com/non-existing-subpage 

is entered? With the solution above it would open the main page again, but JS can't load any content for it. I still want 404 for all of the non existing subpages. How can I achieve it?

MrWhite
  • 43,179
  • 8
  • 60
  • 84
lpasztor
  • 157
  • 13
  • Do you have any other rules in your htaccess? Try adding this at the top below `RewriteEngine on` line – Amit Verma Mar 10 '22 at 18:42
  • @AmitVerma Thank you, but I have already tried this at the beginning. However with the suggestions from MrWhite I can go on. Thank you. – lpasztor Mar 13 '22 at 19:16

1 Answers1

1
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)$ index.html?url=$1 [QSA,L]

My problem with this that however it opens the main page every time, I can't find the query (?url=) anywhere, so I can't use it.

This is an internal rewrite on the server. Consequently, the url URL parameter (present only on the internal rewrite) is only available to a server-side script, not client-side JavaScript. The browser/client only sees the response from the server - it is not aware of what file produced that response. However, client-side JavaScript can see what is present in the browser's address bar, which is accessible via the window.location object.

So, you can instead simplify the RewriteRule directive:

RewriteRule . index.html [L]

And in your JS you can read the requested URL-path from the window.location.pathname property. For example, if you request example.com/foo then the pathname property contains /foo (with a slash prefix) for you to act on accordingly in your script.

I still want 404 for all of the non existing subpages. How can I achieve it?

You can't if you are only using client-side JavaScript. A "404 Not Found" status is an HTTP response sent from the server.

The best you can do in client-side JS is to display what looks-like a "404 Not Found" message to the user and make sure you have a robots meta tag that prevents indexing. But this is still served with a 200 OK HTTP status. Search engines (ie. Google) will likely see this as a soft-404 (ie. a page that looks like a 404, but is served with a 200 OK status).

If you want to serve a 404 HTTP response status then the server would need to be aware of which are valid/invalid URLs.

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • First of all thank you for your quick and detailed answer. `and make sure you have a robots meta tag that prevents indexing.` I have only 1 html (the index.html). So I can't set the noindex robots meta tag there. And of course I want Google to index my "fake" subpages as well. What's your suggestion? – lpasztor Mar 11 '22 at 06:49
  • `If you want to serve a 404 HTTP response status then the server would need to be aware of which are valid/invalid URLs.` If I "have to" make this way, what is the best way to solve this? I have about 30 * 50 combinations for the possible subpages, just like: `www.example.com/site-1-category-1` `www.example.com/site-1-category-2` `www.example.com/site-1-category-3` `www.example.com/site-2-category-1` `www.example.com/site-2-category-2` etc. Should I put all possible variants in the .htaccess file? If yes, which is the best way? And into which part and how? Can you please help in it? – lpasztor Mar 11 '22 at 06:54
  • 1
    @lpasztor "I have only 1 html (the index.html). So I can't set the noindex robots meta tag there." - True, however, you can dynamically inject a robots meta tag using JS, just as you are (presumably) injecting "content". However, whether search engines see this is another matter - although Google should. And you are presumably expecting search engines to index your other JS generated content. – MrWhite Mar 11 '22 at 09:32
  • @lpasztor "I have about 30 * 50 combinations" - By that do you mean 30 x "sites" and 50 x "categories"? Each "category" applies to every "site", so 1500 URLs? I would refrain from literally listing that many URLs (although you could). You could potentially use a _large_ regex, or you could potentially split this up and check the "site" and "category" separately. – MrWhite Mar 11 '22 at 09:42
  • `If you want to serve a 404 HTTP response status then the server would need to be aware of which are valid/invalid URLs.` @MrWhite The only way I know how to tell this information to the server is the .htaccess. And yes, there could be 1500 URLs, which is way too much, I don't want to put that much in the .htaccess, and it may change in the future also. Any way to handle that dynamically? Or I should use something else, not htaccess for that? To separate valid and not valid links, and send the not valid ones to 404. On server side. – lpasztor Mar 11 '22 at 11:11
  • continued here... https://stackoverflow.com/questions/71635604/htaccess-use-variables-to-check-if-subpage-exists – MrWhite Mar 27 '22 at 11:29