I am making a very comprehensive application, which I have been working on for a number of months. For my next step I am writing some code that will delete all the duplicate domains from my textfile.
What I previously did is use the (php) array_unique();
function which would delete my exact duplicates from my txtfile. But I need to delete all the domains that are the same.
Old situation (this would delete url 1 or 2 because they are exact the same):
- google.nl
- google.nl
- google.nl/hello
Desired situation (will delete two of the three urls because the domains are the same):
- google.nl/hello
- google.nl/yellow
- google.nl
So I made a bit of code which shows every url that is in my txtfile onto the screen (nothing special). I do this by using a while-loop:
$file = fopen("file.txt","r");
while(! feof($file))
{
echo fgets($file). "<br />";
}
So I used this tutorial to help myself: how to get domain name from URL. This is the code that I used.
function parse_url_all($url){
$url = substr($url,0,4)=='http'? $url: 'http://'.$url;
$d = parse_url($url);
$tmp = explode('.',$d['host']);
$n = count($tmp);
if ($n>=2){
if ($n==4 || ($n==3 && strlen($tmp[($n-2)])<=3)){
$d['domain'] = $tmp[($n-3)].".".$tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-3)];
} else {
$d['domain'] = $tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-2)];
}
}
return $d;
}
$urls = array('website1','website2');
echo "<div style='overflow-x:auto;'>";
echo "<table style='text-align:left;'>";
echo "<tr><th>URL</th><th>Host</th><th>Domain</th><th>Domain X</th></tr>";
foreach ($urls as $url) {
$info = parse_url_all($url);
echo "<tr><td>" . $url . "</td><td>" . $info['host'] . "</td><td>" . $info['domain'] . "</td><td>" . $info['domainX'] . "</td></tr>";
}
echo "</table></div><br>";
How do I get the output of my while-loop (txtfile) into the array from this line:
$urls = array('output from textfile');
It will probably be something simple, but I just couldn't figure it out.