I have a loop that is constantly adding a variable with an unknown value to a list, and then prints the list. However I don't find a way to ignore the values previously found and added to the list when I want to print the list the next time.
I'm scraping a constantly updating website for keyword-matching links using requests and bs4 inside a loop. Once the website added the links I'm looking for my code adds them to a list, and prints the list. Once the website adds the next wave of matching links, these will also be added to my list, however my code will also add the old links found before to the list again since they still match my keyword. Is it possible to ignore these old links?
url = "www.website.com"
keyword = "news"
results = [] #list which saves the links
while True:
source = requests.get(url).text
soup = BeautifulSoup(source, 'lxml')
options = soup.find_all("a", class_="name-link")
for o in options:
if keyword in o.text:
link = o.attrs["href"] #the links I want
results.append(link) #adds links to list
print(results)
time.sleep(5) #wait until next scrape
#so with every loop the value of 'link' is changing which makes it hard
for me to find a way to ignore previously found links
To maybe make it easier to understand you could think of a loop adding an unknown number to a list with every execution of the loop, but the number should only be printed in the first execution.