I have a list of URLs named urls.list:
https://target.com/?first=one
https://target.com/something/?first=one
http://target.com/dir/?first=summer
https://fake.com/?first=spring
https://example.com/about/?third=three
https://example.com/?third=three
and I want to make them unique based on their domains like https://target.com
, That means each domain with its protocol prints once and the next URLs are avoided.
so the result would be:
https://target.com/?first=one
http://target.com/dir/?first=summer
https://fake.com/?first=spring
https://example.com/about/?third=three
This is what I tried to do:
cat urls.list | cut -d"/" -f1-3 | awk '!a[$0]++' >> host_unique.del
for urls in $(cat urls.list); do
for hosts in $(cat host_unique.del); do
if [[ $hosts == *"$urls"* ]]; then
echo "$hosts"
fi
done
done