22

I already have a 404 handler in the SPA which works. The problem here is that Google for example links to old pages that no longer exist. While the user will see a custom 404 component, google will get, I assume, a 200 OK and continue to think the page is valid.

{
  path: '*',
  name: 'not-found',
  component: NotFound // 404
}

I have the server re-route to / and let vue handle the routing using History:

<IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteCond %{HTTPS} off
  RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
  RewriteBase /
  RewriteRule ^index\.html$ - [L]
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteRule . /index.html [L]
</IfModule>

It's a standard Vue CLI install with a php backend. PHP is currently only used for API calls.

Is there a way to have the server return a 404 status code in this scenario?

Suggested solution? The server knows nothing about the routing happening in the frontend, but I could have webpack output a sitemap or something like that which can be verified by the server, set 404 in the header and let it load the SPA that show the 404. Would this be OK or is there a better solution?

Note I ended up automatically creating a sitemap and then checking the routes against the sitemap. If the route didn't match it was rerouted to a custom 404. This worked reasonably well, but Google was still a bit confused.

Eirinn
  • 836
  • 1
  • 10
  • 23
  • I don’t think there is a solution. Those http codes are used on another layer. Vuejs is just too late. The only way would be to triage it by sending an http get request using Ajax and then check the status code. If it 404 either redirect or make use of the response. – msphn Jan 18 '19 at 18:38
  • @msphn Suggested solution wouldn't be a viable fallback? Let PHP read the output sitemap and validate the route - then decide to set 404 in the header or not? Or redirect to a 404 I guess. It's just a bit clunky. – Eirinn Jan 18 '19 at 18:49
  • No it would. But you need a 404 code? – msphn Jan 18 '19 at 19:55
  • It’s just an issue of outdated concepts about SEO and search engines in general. Also customers don’t understand those concept and keep crying for SEO and stuff. I’ve build APIs to generate vue routes just to be able to actually show open graph data to certain user agents. It’s pain in the ass. – msphn Jan 18 '19 at 19:57
  • @msphn I don't need the 404, search engines do. Another site used to be on the same domain so it's pinging routes that no longer exists which is annoying. – Eirinn Jan 22 '19 at 20:50

1 Answers1

25

I have performed some research on how SPA can mimic or respond to search-bots-requests, so here we go - three working solutions.

Supporting links:

  1. Updating Page Title & Metadata with Vue.js & vue-router

Meta tag #1

Description:

HTTP code 404 means that there is no resource or it was removed permanently. Removed resource means that we want to tell GoogleBot to remove the "dead" link from search index. Great! Now we have another question which can be answered - <meta name=”robots” content=”noindex”>

As Google docs state:

You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code, or by returning a 'noindex' header in the HTTP request. When Googlebot next crawls that page and see the tag or header, Googlebot will drop that page entirely from Google Search results, regardless of whether other sites link to it.

Supporting links:

  1. https://searchengineland.com/meta-robots-tag-101-blocking-spiders-cached-pages-more-10665
  2. https://support.google.com/webmasters/answer/79812?hl=en
  3. https://support.google.com/webmasters/answer/93710?visit_id=636835318879056986-3786307088&rd=1

Meta tag #2

Description:

If we cannot (or do not want to) use our server to respond with 404 or any other code we can try to perform some sort of redirect - seo-safe redirect (if there is no JS enabled).

This redirect uses HTML meta-tag, an example (redirects to example.com immediately):

<meta http-equiv="refresh" content="0; url=http://example.com/">

Quote from StackOverflow answer:

As a reminder, and although it is not the preferred way to perform a redirect, Google accepts and follows pages having a Refresh tag with its delay set to 0, because, in some tricky cases, there is simply no other way to perform a redirect. This is the recommended method for Blogger pages (owned by Google).

HTTP code 301 will eventually be converted to 404 if you will permanently redirect to a file which does not exist. From Google Docs (Prepare for 301 redirects):

While Googlebot and browsers can follow a "chain" of multiple redirects (e.g., Page 1 > Page 2 > Page 3), we advise redirecting to the final destination. If this is not possible, keep the number of redirects in the chain low, ideally no more than 3 and fewer than 5. Chaining redirects adds latency for users, and not all browsers support long redirect chains.

Supporting links:

  1. https://en.wikipedia.org/wiki/Meta_refresh
  2. SEO consequences of redirecting with META REFRESH
  3. http://sebastians-pamphlets.com/google-and-yahoo-treat-undelayed-meta-refresh-as-301-redirect/
  4. https://developer.mozilla.org/en-US/docs/Web/HTTP/Redirections#Permanent_redirections

JavaScript Redirect

Description:

Perform an onload-redirect with window.location = '/404.html' to invalid location (a file that does not exist) + integrate Google Not Found Widget.

Supporting links:

  1. https://googleblog.blogspot.com/2008/10/helping-website-oweners-fix-broken.html
AndrewShmig
  • 4,843
  • 6
  • 39
  • 68
  • Wow, a lot of effort in this one - it's appreciated! I'm not sure it fixes the problem however since they all rely on modifying the contents of the page once it has loaded and the header has a 200OK in it. I'll try adding the noindex meta on 404 page routed and see what happens. I'll get back to you :) – Eirinn Jan 22 '19 at 21:02
  • @Eirinn, Google is smarter, so 200 OK does not mean that the page is OK and no need to parse additional content - headers etc. – AndrewShmig Jan 22 '19 at 21:17
  • Alright, I'm testing right now, may be a little while before I get results. If I'm lucky that will take care of google. All the bots probing the site will probably still keep probing the useless addresses. – Eirinn Jan 22 '19 at 21:24
  • @Eirinn, great! Waiting for final results :) It was an interesting task and it's kind of correlates with what I am going to do in the near future for my own SPA. – AndrewShmig Jan 22 '19 at 21:27
  • Solution nr.1 doesn't seem to work. Tested with https://seositecheckup.com/tools/noindex-tag-test – Eirinn Jan 23 '19 at 11:34
  • Google seems it gets a 200OK, but registers no-indexing. Hmm not optimal, but may be enough. – Eirinn Jan 26 '19 at 22:31
  • 1
    Marked as answer - even if the first solution doesn't fully fix the problem I'm sure with some fiddling I'll be able to fix it. – Eirinn Jan 31 '19 at 09:33
  • 2
    @Eirinn, can you elaborate with what kind of fiddling and what method worked partially - second one? – AndrewShmig Jan 31 '19 at 10:51
  • 2
    Oh you're right, nothing worse than stumbling upon and old thread with "thx it worked" with no description. I used the first one and achieved a noindex for my 404 page. This causes a 200OK and a "leave me alone" for search machines. #2 may be a better choice and I will experiment with that at some point. Maybe use Router to check if it was an internal or external request (inside spa or not) and then do #1 or #2. Dunno yet :) – Eirinn Jan 31 '19 at 12:45
  • That doesn't answer the question regarding vue router at all.. In fact most of the answer seems to disregard vue router since even the redirect is not using the correct method, I would think a SSR module would do the trick, but not any of those suggestions – Tofandel Nov 13 '20 at 18:28