Keep old website (HTML files) on webserver but disallow search agents to index them

Question

I’ve just finished a website for a client who is going to replace their old (very old, HTML hard-coded website). The problem is that they (for now) want to save their old website and all the files on the webserver in the original position. This does not create any issues with the new website which is made in PHP and Wordpress but it makes a big deal when Google (and others) are dropping by with their search robots and indexing.

When doing a Google search it still finds the old HTML files. Is there any way that I could “keep” the old HTML files on the web server but make sure that for the first no robots are going to index them and if anyone is trying to navigate to an HTML page, e.g. http://www.clientdomain.com/old_index_file.html, they are getting redirect? I think the last part might be able to be done in .htaccess but I haven’t found anything useful searching for it.

The first question about not allowing robots and agents to index HTML files, I’ve tried to put these two lines in my robots.txt file

Disallow: /*.html$  
Disallow: /*.htm$

But I’m unsure if it will work?

I might deal with this in a completely wrong way but I’ve never tried that a client has requested to keep the old website on same server and in original location before.

Thanks,
- Mestika

How exactly are you "keeping" a website up if robots can't index it, and nobody can access it? It sounds like you want to preserve the files somewhere, but probably not on your web server. — Jonathan Wren, Jan 22 '13 at 19:54

score 2 · Answer 1 · answered Jan 22 '13 at 19:57

<?php
$redirectlink = ‘http://www.puttheredirectedwebpageurlhere.com‘;
//do not edit below here
header (‘HTTP/1.1 301 Moved Permanently’);
header(‘Location: ‘.$redirectlink);
exit;
?>

This code will use a 301 redirect the page to the URL that you desire. The filename of this .php should be the URL slug of the page you want to redirect. 301 Redirect

A 301 redirect, or also known as a permanent redirect, should be put in place to permanently redirect a page. The word ‘permanent’ is there to imply that ALL qualities of the redirected page will be passed on to the detour page.

That includes:

PageRank

MozRank

Page Authority

Traffic Value

A 301 redirect is implemented if the change you want to make is, well… permanent. The detour page now embodies the redirected page as if it was the former. A complete takeover. The old page will be removed from Google’s index and the new one will replace it.

Or you can do it in your htaccess like shown by the above poster.

score 1 · Answer 2 · edited May 23 '17 at 11:49

1

There's probably a lot of ways to handle this, assuming you have a clear mapping of pages from the old template to the new one, you could detect the Google bot in your old template (see [1]) and do a 301 redirect (see [2] for example) to the new template.

List item [1] how to detect search engine bots with php?
List item [2] How to implement 303 redirect?

edited May 23 '17 at 11:49

Community

1
1

answered Jan 22 '13 at 19:57

George P

736
5
12

score 1 · Answer 3 · answered Jan 22 '13 at 20:00

Will take some work, but sounds like you'll need to crack open your htaccess file and start adding 301 redirects from the old content to the new.

RewriteCond %{REQUEST_URI} ^/oldpage.html
RewriteRule . http://www.domainname.com/pathto/newcontentinwp/ [R=301,L]

Rinse and repeat

score 0 · Answer 4 · answered Jan 22 '13 at 19:56

This is definitely something mod_rewrite can help with. Converting your posted robots.txt to a simple rewrite:

RewriteEngine on
RewriteRule /.*\.html /index\.php [R]

The [R] flag signifies an explicit redirect. I would recommend seeing http://httpd.apache.org/docs/2.4/rewrite/remapping.html for more information. You can also forbid direct access with the [F] flag.

Keep old website (HTML files) on webserver but disallow search agents to index them

4 Answers4