I'm trying to get URLs of a particular website. I'm in the process of it. Can anyone help me? I'm able to delete some of the elements in the list(linkaddresses) after appending it. First i'm taking all urls from particular website (swiggy.com here). After i'm trying to delete list elements( linkaddresses) starting with '/'. When i run the below programme, It is only deleting some of them. In programme itself i printed all the list (linkaddresses) elements before and after modification
below is code in python:
import urllib from urllib import request from bs4 import BeautifulSoup
def linkgetter(searchlink):
pagesource = urllib.request.urlopen(searchlink)
linkaddresses = []
soup = BeautifulSoup(pagesource,'lxml')
for link in soup.findAll('a'):
if link.get('href') == None:
continue
else:
linkaddresses.append(link.get('href'))
print(linkaddresses)
for i in linkaddresses:
if i.startswith('#'):
linkaddresses.remove(i)
elif i.startswith('/'):
linkaddresses.append(searchlink+i)
linkaddresses.remove(i)
print('\n')
print('\n')
print('\n')
print(linkaddresses)
linkgetter('https://www.swiggy.com')