0

I am trying to create a registration system that allows users to submit their website URLs. Now when a user enters a URL, it checks against the database to see if it already exists, and rejects it if it does.

However, my problem is because of this :

If http://www.example.com is in the database and I enter http://example.com, this counts as a different URL as far as the check is concerned, and it allows the submission.

Is there a proper way to handle this, apart from retrieving all records, removing the www if present, and then comparing? (Which is a terribly inefficient way to do so!)

Note : Adding Laravel tag in case it has any helper functions for this (I am using a laravel-4 installation).

EDIT : This is my current logic for the check :

$exists_url = DB::table("user_urls")
        ->where('site_url', 'like', $siteurl)
        ->get();
        if($exists_url)
        {
            return Redirect::to('submiturl')->withErrors('Site already exists!');
    }

EDIT 2 : One way is to take the given URL http://www.example.com, and then search the database for http://www.example.com, www.example.com, http://example.com and example.com. However I'm trying to find a more efficient way to do this!

Sainath Krishnan
  • 2,089
  • 7
  • 28
  • 43
  • @kiks73 I am already using that to get the `host` key into the `$siteurl` variable. But I store the full URL into the database, not just the domain – Sainath Krishnan Jun 03 '14 at 05:31
  • The Hostname, HTTP and URI specs cover the different cases of comparing URIs, protocols, which also covers hostname comparison. Learn about URL encoding and URI normalization in specific. - All details which are hard to cover in a single answer (so most likely not fitting for SO), so I just close against a duplicate which contains a library that offers you to do most of the job and also links the rules. – hakre Jun 06 '14 at 05:44
  • @hakre You must lead a very sad life if you find this entertaining. Please continue :) – Sainath Krishnan Jun 06 '14 at 05:47
  • This is just leaving pointer for future users who might end here due to some search and actually looking for answers. Please don't think it's all meant personally, it's just in form of a comment, but the context is broader and site-wide. – hakre Jun 06 '14 at 05:48

2 Answers2

1

I think before you implement a solution you should abstractly flesh out your policy more thoroughly. There are many parts of a URL which may or may not be equivalent. Do you want to treat protocols as equivalent? https://foo.com vs http://foo.com. Some subdomains might be aliases, some might not. http://www.foo.com vs http://foo.com, or http://site1.foo.com vs http://foo.com. What about the path of the the URL? http://foo.com vs http://foo.com/index.php. I wouldn't waste your time writing a comparison function until you've completely thought through your policy. Good luck!

UPDATE:

Something like this perhaps:

$ignore_subdomains = array('www','web','site');
$domain_parts = explode('.',$siteurl); 
$subdomain = strtolower(array_shift($domain_parts));
$siteurl = (in_array($subdomain,$ignore_subdomains)) ? implode('.',$domain_parts) : $siteurl;
//now run your DB comparison query
halabuda
  • 386
  • 1
  • 11
  • I strip protocol tags before I save, and only the main website URL, no pages! So all records will be of the order `www.example.com` and `example.com` ONLY – Sainath Krishnan Jun 03 '14 at 05:48
  • To better explain that, `http://example.com` and `https://example.com` will save as `example.com`. And `http://www.example.com` and its variations will save as `www.example.com` – Sainath Krishnan Jun 03 '14 at 05:49
  • good start. in that case i would create an array of common subdomain aliases such as www and web then run your comparison against that array. – halabuda Jun 03 '14 at 06:13
  • Thank you so much for your answer! Can you please explain how that last line works? – Sainath Krishnan Jun 03 '14 at 06:36
  • 1
    lets assume $siteurl contains "www.foo.com". in_array checks if the value of $subdomain exists as a value in the array $ignore_subdomains. $subdomain will contain "www" in this case because the $siteurl string was broken up into 3 segments (www,foo,com) using the explode() function. if it matches, in this case it does, the remaining segments of the exploded $siteurl are imploded() back together and assigned to $siteurl which will now contain "example.com". otherwise there is no match and the original $siteurl is preserved as it was. – halabuda Jun 03 '14 at 06:44
  • Ok so am I correct in saying that it removes `www` (or whatever else you specify in `$ignore_subdomains` ) if present, and does nothing if not? – Sainath Krishnan Jun 03 '14 at 06:51
  • correct, but only if its the leftmost portion of the url. so www.foo.com would match but test.www.foo.com would not. – halabuda Jun 03 '14 at 06:55
  • That was very informative, thank you! :) – Sainath Krishnan Jun 03 '14 at 07:15
0

You can check before sending data to database using PHP. Following is a small example. Obviously you can make it more advanced as per your liking.

<?php
   $test = "http://www.google.com";
   $testa = "http://google.com";
   if (str_replace("www.","",str_replace("http://","",$testa)) == str_replace("www.","",str_replace("http://","",$test))) {
     echo "same";
   }
   else {
     echo "different";
   }
?>

Following is MySQL Replace example. In this example 'url' is database field.

SELECT * FROM `urls` WHERE REPLACE(url, "http://www","") = "google.com" OR REPLACE(url,"http://","") = "google.com"

Once again this is very basic just for your better understanding.

In case you need any further assistance kindly do let me know

Saad Bashir
  • 4,341
  • 8
  • 30
  • 60
  • Ok so my problem here is (using your example), `$test` has to be taken from the database. Which means, I need to do this check for every record that is returned, to check if my record exists. So if `http://www.google.com` is the submitted URL, there is no way for me to equate that to `google.com` which is in the database, is there? – Sainath Krishnan Jun 03 '14 at 05:35
  • Well you can use REPLACE function of mysql. REPLACE(content, 'search', 'replacewith'). This is the efficient way. The other way is you do a while loop check each individual record and see if it has matched anything. If not save the results. But in your case I would highly recommend that you save your records in domain.com format without http:// or www. You can strip these two tags if they exist before saving them in database. It will make your life easier. And if you want to show them when you output your data you can always add these two at output stage. – Saad Bashir Jun 03 '14 at 05:40
  • Following link will give you a little insight into REPLACE function as the author compares it to str_replace. http://davidwalsh.name/mysql-replace – Saad Bashir Jun 03 '14 at 05:42