As you can see I have two variables defined: a variable named href
which has multiple links as one string and a variable named text
, now in text
I have the links that I have already visited/downloaded from. I want Python to print the text that is present in href
but not in text
.
So I imagine its using a for loop?
When I execute single letters get returned, all separated on a different line.
import requests
from bs4 import BeautifulSoup
url = 'amazon.com'
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')
for link in soup.findAll('a', {'class': 'gridItem-trackInfo-title-anchor'}):
href = link.get('href')
file = open('file.txt', 'r')
text = file.read()
file.close
for i in href:
if i not in text:
print(i)