2

I'd like to run a search with selenium and click the "more results" button at the end of a DDG search.

The DDG search no longer shows the button when it's shown all the results for a query.

I'd like to exit out of the try loop in the case where there is no button.

I'll share what I'm trying now. I also tried earlier these two options: If len(button_element) > 0: button_element.click() and I tried If button_element is not None: button_element.click().

I'd like the solution to use Selenium so it shows the browser because it's helpful for debugging

This is my code with a reproducible example:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup

    browser = webdriver.Chrome()        
    browser.get("https://duckduckgo.com/")
    search = browser.find_element_by_name('q')
    search.send_keys("this is a search" + Keys.RETURN)
    html = browser.page_source

    try:
        button_element = browser.find_element_by_class_name('result--more__btn')

        try:
            button_element.click()
        except SystemExit:
            print("No more pages")

    except:
        pass
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
tadon11Aaa
  • 400
  • 2
  • 11

3 Answers3

1

You can use pure HTML version of DDG, on URL https://duckduckgo.com/html/?q=. This way you can use pure requests/beautifulsoup method and get all pages easily:

import requests
from bs4 import BeautifulSoup


q = '"centre of intelligence"'
url = 'https://duckduckgo.com/html/?q={q}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

soup = BeautifulSoup(requests.get(url.format(q=q), headers=headers).content, 'html.parser')

while True:
    for t, a, s in zip(soup.select('.result__title'), soup.select('.result__a'), soup.select('.result__snippet')):
        print(t.get_text(strip=True, separator=' '))
        print(a['href'])
        print(s.get_text(strip=True, separator=' '))
        print('-' * 80)

    f = soup.select_one('.nav-link form')
    if not f:
        break

    data = {}
    for i in f.select('input'):
        if i['type']=='submit':
            continue
        data[i['name']] = i.get('value', '')

    soup = BeautifulSoup(requests.post('https://duckduckgo.com' + f['action'], data=data, headers=headers).content, 'html.parser')

Prints:

Centre Of Intelligence - Home | Facebook
https://www.facebook.com/Centre-Of-Intelligence-937637846300833/
Centre Of Intelligence . 73 likes. Non-profit organisation. Facebook is showing information to help you better understand the purpose of a Page.
--------------------------------------------------------------------------------
centre of intelligence | English examples in context | Ludwig
https://ludwig.guru/s/centre+of+intelligence
(Glasgow was "the centre of the intelligence of England" according to the Grand Duke Alexis, who attended the launch of his father Tsar Alexander II's steam yacht there in 1880).
--------------------------------------------------------------------------------
Chinese scientists who studied bats in Aus at centre of intelligence ...
https://www.youtube.com/watch?v=UhcFXXzf2hc
Intelligence agencies are looking into two Chinese scientists in a bid to learn the true origin of COVID-19. Two Chinese scientists who studied live bats in...
--------------------------------------------------------------------------------

... and so on.
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • How do I make this show the browser like the original code? – tadon11Aaa Jun 21 '20 at 17:06
  • Edited the question to include the requirement that the solution shows the browser – tadon11Aaa Jun 21 '20 at 17:09
  • @tadon11Aaa BeautifulSoup doesn't use browser at all. To debug a page, you can do `print(soup)` or `print(soup.prettify())`. You can redirect this output to file and then open it in the browser manually. – Andrej Kesely Jun 21 '20 at 17:10
  • Thanks! I'll edit my answer to specify that I'd like to do this with selenium – tadon11Aaa Jun 21 '20 at 17:11
  • @tadon11Aaa I don't have experience with Selenium, but use URL `https://duckduckgo.com/html/?q=`. It doesn't use javascript so navigating it would be easier I think. – Andrej Kesely Jun 21 '20 at 17:14
1

To click the More Results button at the end of a search results using Selenium WebDriver you have to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.keys import Keys
    from selenium.common.exceptions import TimeoutException
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://duckduckgo.com/')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("this is a search" + Keys.RETURN)
    while True:
          try:
              WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.result--more__btn"))).click()
              print("Clicked on More Results button")
          except TimeoutException:
              print("No more More Results button")
              break
    driver.quit()
    
  • Console Output:

    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    No more More Results button
    

You can find a relevant discussion in How to extract the text from the search results of duckduckgo using Selenium Python

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

Use WebDriverWait to wait until there is an more button

wait = WebDriverWait(browser, 15) # 15 seconds timeout 
wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))

This example code clicks the more button until there is no more button anymore for chrome replace firefox with chrome

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

browser = webdriver.Firefox()        
browser.get("https://duckduckgo.com/")
search = browser.find_element_by_name('q')
search.send_keys("this is a search" + Keys.RETURN)

while True:
    try:
        wait = WebDriverWait(browser, 15) # 15 seconds timeout
        wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))

        button_element = browser.find_element_by_class_name('result--more__btn')
        button_element.click()
    except:
        break
Hexception
  • 722
  • 10
  • 25
  • My program keeps waiting for the new **more button** so it clicks the **more button** until it cant find a **more button** in **15 seconds** And it does that infinitely – Hexception Jun 21 '20 at 16:48
  • So you have to wait for the timeout because maybe there will a button be generated. But I think you can lower the timeout from 15 to 3 seconds. – Hexception Jun 21 '20 at 16:52
  • you cant run code in codeshare just copy it into your editor – Hexception Jun 21 '20 at 17:01