1

I'm migrating a complex old website to a new one coded with codeigniter and i'm facing a lots of rewriting url problems leading to duplicated content because of the way that the codeigniter's routes config works.

I've old urls like this:

  • /detail.php?id=ABCDE&lang=en&page=2
  • /detail/ABCDE/en/2

The new site instead have seo friendly urls like:

  • /en/products/hard-disks-2.html

In my routes config i've:

  • $route['(:any)/(:any)/(:any)'] = 'controller/$1/$2/$3';
  • $url_suffix is '.html'

This is leading to duplicated content because:

  • /en/products/hard-disks-2
  • /en/products/hard-disks-2.html
  • /en/products/hard-disks-2.html?p=2
  • /en/products/hard-disks-2?p=2
  • /en/products/hard-disks-2.html/
  • /en/products/hard-disks-2.html/.html

all of the above are valid routes for codeigniter and this lead for duplicated content within the website.

Is there a way to avoid this? Maybe using regular expression?

I cannot solve this problem with .htaccess because the website has too many possibile combinaton of the urls and i've also some controller where i still need to use "get" parameters.

jondavidjohn
  • 61,812
  • 21
  • 118
  • 158
Antonio
  • 11
  • 4
  • If you never link the duplicate urls, google will never find it... – Aren May 19 '11 at 22:02
  • @Aren unlucky some mistake by other people or by the code itself will happen and that's happened; also many of this are generated because the old website is fully old-style /index.php?a=3&b=4 so the .htaccess cannot afford all of the parameters to migrate with a 301 the old urls – Antonio May 20 '11 at 12:15

1 Answers1

0

I finally figure out how do not have duplicate urls parsing.

First of all in config.php remove the suffix, better never user it: $config['url_suffix'] = '';

Then in routes.php never use wildcards and always uses regular expression.

I.e, if i use: $route['(:any)/(:num)'] = 'homepage/parser/$1/$2'; this will work for all the following urls:

/a/10
/a/10/11
/a/10/11/12

and so on!

Instead:

$route['([\w_-]+)/(\d+)'] = 'homepage/parser/$1/$2';

this only work for

/a/10

and:

$route['([\w_-]+).html'] = 'homepage/parser/$1';

will only work if you URLs really end in .html

Unlucky /a/10.html/ is still a duplicate, so, i need at least one .htaccess rule to remove trailing slashes from URLs

I really need unique URLs so i think i'm dropping any future codeigniter development for this project where i've mixed url: 1) .html 2) directories 3) old dynamic urls

Instead i figure out that for SEO purpouse probably is the best to: - only use pages without extensions - avoid any directories

So if this is the case (another project of mine), i just use plain URLs in my code and regular expressions in routes.php.

The only issues is the trailing slash duplicate problem but this can be avoided globally with this .htaccess from this other solution: Remove trailing slash using .htaccess except for home / landing page

Community
  • 1
  • 1
Antonio
  • 11
  • 4