Generally speaking if you want to search through something and only find unique values, you need to maintain a list of values you have already found. For example:
now = datetime.datetime.now()
url1 = 'https://orangecounty.craigslist.org/search/sss?query=arcade&sort=rel'
url2 = 'https://losangeles.craigslist.org/search/sss?query=arcade&sort=rel'
r1 = requests.get(url1)
r2 = requests.get(url2)
found = []
print(r1.status_code, r2.status_code)
from bs4 import BeautifulSoup
data1 = r1.text
data2 = r2.text
#print(data1)
soup = BeautifulSoup(data1 + data2, 'html.parser')
for link in soup.findAll('a'):
listing1 = link.get('href')
if 'millipede' in listing1.lower() and listin1 not in found:
print('millipede was found! ' + listing1)
found.append(listing1)
arcade_search()
However in this particular case, there is likely to be more work to do. My guess is you are finding duplicate links across responses the two different get requests, not within either one. That is the links within the response from get(url1)
are all unique, and same for the links in the response from get(url2)
, but you find that some of the links returned in get(url1)
show up in get(url2)
and vice versa because the Orange County and Los Angeles areas are not disjoint. Moreover, it appears that Craigslist returns links relative to the area in which you are searching, e.g. https://orangecounty.craigslist.org/vgm/d/restored-arcade-game-with-800/6495065079.html
vs
https://losangeles.craigslist.org/wst/sgd/d/namco-upright-classic-arcade/6511455107.html
So when you say that the urls
are duplicate, you really mean that two different urls point to the same page. Assuming this is actually what you are experiencing, the above code that I provided won't solve your problem because it can only detect if the urls you find are litterally the same. You could try only keeping track or the part after d/
or maybe only the descriptive string portion (by which I mean for example 'namco-upright-classic-arcade') but I can't guarantee that these will either be the same for the same listing across search regions or different for different listings across regions.