I'm practising some Python scraping and I'm a bit stuck with the following exercise. The aim is to scrape the tickers resulting when applying some filters. Code below:
tickers = []
counter = 1
while True:
url = ("https://finviz.com/screener.ashx?v=111&f=cap_large&r="+ str(counter))
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
html = soup(webpage, "html.parser")
rows = html.select('table[bgcolor="#d3d3d3"] tr')
for i in rows[1:]:
a1, a2, a3, a4 = (x.text for x in i.find_all('td')[1:5])
i = a1
tickers.append(i)
counter+=20
if tickers[-1]==tickers[-2]:
break
I'm not sure how to extract only 1 column so I'm using the code for all them (a1, a2, a3, a4 = (x.text for x in i.find_all('td')[1:5]))
, is there a way just to get the first column?
Is there a way to avoid having to hardcode '20' in the script?
When I run the code it creates a duplicate of the last ticker, is there another way to make the code stop when it went through all the entries?