0

I added a counter at the end of the code below. However, when the code continues to run past a counter of 10, which is where i set it to break. Can't figure out what I'm doing wrong.

chromedriver='C:\\Users\\user\\Downloads\\chromedriver_win32\\chromedriver.exe'

options=webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1200x600')
browser=webdriver.Chrome(executable_path=chromedriver,chrome_options=options)

path='C:/Users/something.xlsx'

xls=pd.read_excel(path)
data=xls["companyname"].tolist()
data=[w.replace('" "','+') for w in data]

book=openpyxl.load_workbook(path)
sheet=book.active
try:
    delete=book['Python Data']
    book.remove(delete)
except Exception:
    pass
book.create_sheet('Python Data')
ws1=book['Python Data']
book.save(path)

counter=0

while data:
    for item in data:
        browser.get('https://duckduckgo.com/?q='+item+'&t=h_')
        results = browser.find_elements_by_id('links')
        num_page_items = len(results)
        for i in range(num_page_items):
            mylist = results[i].text
            row=len(ws1['A']) + 1
            ws1.cell(row=1,column=1,value="Results")
            ws1.cell(row=row,column=1,value=mylist)
            book.save(path)
            counter += 1
            print(counter)
            if counter==10:
                break
michael
  • 412
  • 1
  • 3
  • 13
  • 5
    Why do you use `while data`? I think that breaks your logic. – bc291 Mar 02 '19 at 19:06
  • I borrowed a chunk of code from here: https://stackoverflow.com/questions/46771995/scraping-duckduckgo-with-python-3-6?rq=1. However, I also needed to feed a list of search terms to the duckduck go url and I found some other code (don't remember where) that I thought would allow me to do this. It included the while loop but I suppose I went astray when using it for my particular code. Can advise how to properly feed the url while avoiding any silly loop errors? – michael Mar 02 '19 at 19:42

2 Answers2

1

Your code after reaching counter == 10 is executed again because of infinite while data statement. Please notice, that break does not escape while statement here, it does for for i in range(num_page_items).

Please use following code:

# notice lack of while data statement
for item in data:
    browser.get('https://duckduckgo.com/?q='+item+'&t=h_')
    results = browser.find_elements_by_id('links')
    num_page_items = len(results)
    for i in range(num_page_items):
        mylist = results[i].text
        row=len(ws1['A']) + 1
        ws1.cell(row=1,column=1,value="Results")
        ws1.cell(row=row,column=1,value=mylist)
        book.save(path)
        counter += 1
        print(counter)
        if counter==10:
            break
bc291
  • 1,071
  • 11
  • 23
1

As written in 4.4. break and continue Statements, and else Clauses on Loops:

The break statement, like in C, breaks out of the innermost enclosing for or while loop.

In your code it is:

for i in range(num_page_items):
     .....
Alex Yu
  • 3,412
  • 1
  • 25
  • 38