You can parse the url either yourself with regex/split() or with url parser package like urllib.parse
Store path
(page) in a dict
(it will give you fast lookup O(n)
on average) and see if its not already there,if not then add page
as key and url
as value.
Take dict values, it will give you unique urls only
from urllib.parse import urlparse
list_url = [
"http://www.example.com/index.php?id=1",
"http://www.example.com/index.php?id=2",
"http://www.example.com/page.php?id=1",
"http://www.example.com/page.php?id=2",
"blog.example.com/page.php?id=2",
"subdomain.example.com/folder/page.php?id=2"
]
mydict = {}
for url in list_url:
url_parsed =urlparse(url)
path = url_parsed.path
if path not in mydict:
mydict[path] = url
Taking dictionary value and converting to list
print(list(mydict.values()))
as @waps converted this to similar but list_comphension structure, you can do it if having first id is not your concern.
list({ urlparse(url).path:url for url in list_url }.values())
Output
['http://www.example.com/index.php?id=2',
'http://www.example.com/page.php?id=2',
'blog.example.com/page.php?id=2',
'subdomain.example.com/folder/page.php?id=2']