I have a list of urls. How do I remove the urls if it is not a valid html for example I want to remove pdf/jpy files and also want to remove duplicated domains.
example_list = [
'https://ocp.dc.gov/sites/default/files/dc/sites/ocp/publication/attachments/Report-of-Contracting-Activity-Part-I.pdf',
'https://the1955club.com/',
'https://the1955club.com/aboutus']
so in the new list it should return the below
new_list = ['https://the1955club.com/']