I'm using Python 3 to write a webscraper to pull URL links and write them to a csv file. The code does this successfully; however, there are many duplicates. How can I create the csv file with only single instances (unique) of each URL?
Thanks for the help!
import requests
from bs4 import BeautifulSoup
import csv
from urllib.parse import urljoin
r = requests.get('url')
soup = BeautifulSoup(r.text, 'html.parser')
data = []
for link in soup.find_all('a', href=True):
if '#' in link['href']:
pass
else:
print(urljoin('base-url',link.get('href')))
data.append(urljoin('base-url',link.get('href')))
with open('test.csv', 'w', newline='') as csvfile:
write = csv.writer(csvfile)
for row in data:
write.writerow([row])