3

I have a website I want to crawl. To access the search results, you must first solve a Recaptcha V2 with a callback function (see screenshot below)

Recaptcha V2 with a callback function

I am using a dedicated captcha solver called 2captcha. The service provides me with a token, which I then plug into the callback function to bypass the captcha. I found the callback function using the code in this GitHub Gist and I am able to invoke the function successfully in the Console of Chrome Dev Tools

The function can be invoked by typing any of these two commands

window[___grecaptcha_cfg.clients[0].o.o.callback]('captcha_token')

or

verifyAkReCaptcha('captcha_token')

However, when I invoke these functions using the driver.execute_script() method in Python Selenium, I get an error. I also tried executing **other standard Javascript functions **with this method (e.g., scrolling down a page), and I keep getting errors. It's likely because the domain I am trying to crawl prevents me from executing any Javascript with automation tools.

So, my question is, how can I invoke the callback function after I obtain the token from the 2captcha service? Would appreciate all the help I could get. Thank you in advance to hero(in) who will know his/her way around this tough captcha. Cheers!!

Some extra info to help with my question:

  1. Automation framework used --> Python Selenium or scrapy. Both are fine by me

  2. Error messages --> Error message 1 and Error message 2

  3. Code

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from twocaptcha import TwoCaptcha
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Instantiate a solver object
solver = TwoCaptcha(os.getenv("CAPTCHA_API_KEY"))
sitekey = "6Lfwdy4UAAAAAGDE3YfNHIT98j8R1BW1yIn7j8Ka"
url = "https://suchen.mobile.de/fahrzeuge/search.html?dam=0&isSearchRequest=true&ms=8600%3B51%3B%3B&ref=quickSearch&sb=rel&vc=Car"

# Set chrome options
chrome_options = Options()
chrome_options.add_argument('start-maximized') # Required for a maximized Viewport
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging', 'enable-automation'])
chrome_options.add_experimental_option("detach", True)
chrome_options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})

# Instantiate a browser object and navigate to the URL
driver = webdriver.Chrome(chrome_options=chrome_options)

driver.get(url)

driver.maximize_window()

def solve(sitekey, url):
    try:
        result = solver.recaptcha(sitekey=sitekey, url=url)
    except Exception as e:
        exit(e)

    return result.get('code')

captcha_key = solve(sitekey=sitekey, url=url)
print(captcha_key)

# driver.execute_script(f"window[___grecaptcha_cfg.clients[0].o.o.callback]('{captcha_key}')") # This step fails in Python but runs successfully in the console
# driver.execute_script(f"verifyAkReCaptcha('{captcha_key}')") # This step fails in Python but runs successfully in the console
oelmaria
  • 51
  • 7
  • While the script runs you want that the browser window stays in background or it is not a problem if it stays visible? I ask because in the second case you can solve the captch easily with pyautogui and i can give you details about how to use it – sound wave Jan 25 '23 at 16:12
  • Hey @soundwave Preferably, I want to run Selenium in headless mode, but I can also work with Selenium in non-headless mode. I've been trying to solve this problem for 4 days and haven't been successful thus far, so I'd be happy with any solution that would get me past the captcha at this point. Thanks a lot for your help. – oelmaria Jan 25 '23 at 16:50

3 Answers3

0

To solve the captcha we can use pyautogui. To install the package run pip install pyautogui. Using it we can interact with what appears on the screen. This means that the browser window must be visible during the execution of the python script. This is a big drawback with respect to other methods, but on the other side it is very reliable.

In our case we need to click on this box enter image description here to solve the captcha, so we will tell pyautogui to locate this box on the screen and then click on it.

So save the image on your computer and call it box.png. Then run this code (replace ... with your missing code).

import pyautogui
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

...

driver.get(url)
driver.maximize_window()

# html of the captcha is inside an iframe, selenium cannot see it if we first don't switch to the iframe
WebDriverWait(driver, 9).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "sec-cpt-if")))

# wait until the captcha is visible on the screen
WebDriverWait(driver, 9).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#g-recaptcha')))

# find captcha on page
checkbox = pyautogui.locateOnScreen('box.png')
if checkbox:
    # compute the coordinates (x,y) of the center
    center_coords = pyautogui.center(checkbox)
    pyautogui.click(center_coords)
else:
    print('Captcha not found on screen')
sound wave
  • 3,191
  • 3
  • 11
  • 29
  • Thank you very much for your answer. I tried out your code. It indeed clicks on the captcha box, but then I get an image puzzle to solve. Clicking on the box often does not invoke the callback function immediately. You first need to solve the puzzle so you can proceed. Any thoughts on how to solve this? – oelmaria Jan 27 '23 at 08:23
  • When I tried it didn't ask to solve the puzzle, could send a screenshot? Anyway, I don't think is possible to solve it with pyautogui, or is too much difficult to do sorry – sound wave Jan 27 '23 at 08:58
0

Based on @sound wave's answer, I was able to invoke the callback function and bypass the captcha without pyautogui. The key was to switch to the captcha's frame using the frame_to_be_available_and_switch_to_it method. Thanks a mil to @sound wave for the amazing hint.

Here's the full code for anyone who's interested. Keep in mind that you will need a 2captcha API key for it to work.

The thing that I am still trying to figure out is how to operate this script in headless mode because the WebDriverWait object needs Selenium to be in non-headless mode to switch to the captcha frame. If anyone knows how to switch to the captcha frame while working with Selenium in headless mode, please share your knowledge :)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from twocaptcha import TwoCaptcha
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from dotenv import load_dotenv
import os
import time

# Load environment variables
load_dotenv()

# Instantiate a solver object
solver = TwoCaptcha(os.getenv("CAPTCHA_API_KEY"))
sitekey = "6Lfwdy4UAAAAAGDE3YfNHIT98j8R1BW1yIn7j8Ka"
url = "https://suchen.mobile.de/fahrzeuge/search.html?dam=0&isSearchRequest=true&ms=8600%3B51%3B%3B&ref=quickSearch&sb=rel&vc=Car"

# Set chrome options
chrome_options = Options()
chrome_options.add_argument('start-maximized') # Required for a maximized Viewport
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging', 'enable-automation'])
chrome_options.add_experimental_option("detach", True)
chrome_options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})

# Instantiate a browser object and navigate to the URL
driver = webdriver.Chrome(chrome_options=chrome_options)

driver.get(url)

driver.maximize_window()

# Solve the captcha using the 2captcha service
def solve(sitekey, url):
    try:
        result = solver.recaptcha(sitekey=sitekey, url=url)
    except Exception as e:
        exit(e)

    return result.get('code')

captcha_key = solve(sitekey=sitekey, url=url)
print(captcha_key)

# html of the captcha is inside an iframe, selenium cannot see it if we first don't switch to the iframe
WebDriverWait(driver, 9).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "sec-cpt-if")))

# Inject the token into the inner HTML of g-recaptcha-response and invoke the callback function
driver.execute_script(f'document.getElementById("g-recaptcha-response").innerHTML="{captcha_key}"')
driver.execute_script(f"verifyAkReCaptcha('{captcha_key}')") # This step fails in Python but runs successfully in the console

# Wait for 3 seconds until the "Accept Cookies" window appears. Can also do that with WebDriverWait.until(EC)
time.sleep(3)

# Click on "Einverstanden"
driver.find_element(by=By.XPATH, value="//button[@class='sc-bczRLJ iBneUr mde-consent-accept-btn']").click()

# Wait for 0.5 seconds until the page is loaded
time.sleep(0.5)

# Print the top title of the page
print(driver.find_element(by=By.XPATH, value="//h1[@data-testid='result-list-headline']").text)
oelmaria
  • 51
  • 7
  • Try with `driver.switch_to.frame("sec-cpt-if")`, but you should put it after a WebDriverWait command otherwise it might be that when `switch_to.frame` is executed, the captcha has not loaded yet on the website – sound wave Jan 27 '23 at 12:40
  • Unfortunately, it doesn't work @sound wave. I have to operate Selenium in non-headless mode. Otherwise, I get a TimeoutException because the frame cannot be found :( – oelmaria Jan 28 '23 at 16:33
  • Hey, I have a very similar situation where the callback function is not working for me. Your code above statesthat the callback script failed on Python. So how di you actually solved it? Whenever I enter the twocaptcha code and then call the callback function, I can see the page refresh but the captcha appears again (as if it didnt submit any value). @oelmaria – Carlos_OL Jun 01 '23 at 23:44
  • @Carlos_OL I posted the code that worked for me. Let me know if it works for you as well – oelmaria Jun 03 '23 at 09:28
0

Here's the code that works for me. Make sure to instantiate the Chrome web driver with the correct options that suit your use case.

# Python imports
from twocaptcha import TwoCaptcha
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from dotenv import load_dotenv
import os
import time

# Load the environment variables
load_dotenv()

solver = TwoCaptcha(os.getenv("CAPTCHA_API_KEY"))
sitekey = "6Lfwdy4UAAAAAGDE3YfNHIT98j8R1BW1yIn7j8Ka"
base_url = "https://suchen.mobile.de/fahrzeuge/search.html"

# Define a function to solve the Captcha
def solve_captcha(sitekey, url):
    try:
        result = solver.recaptcha(sitekey=sitekey, url=url)
        captcha_key = result.get('code')
        print(f"Captcha solved. The key is: {captcha_key}\n")
    except Exception as err:
        print(err)
        print(f"Captcha not solved...")
        captcha_key = None

    return captcha_key

# Define a function to invoke the callback function
def invoke_callback_func(driver, captcha_key):
    try: # Sometimes the captcha is solved without having to invoke the callback function. This piece of code handles this situation
        # html of the captcha is inside an iframe, selenium cannot see it if we first don't switch to the iframe
        WebDriverWait(driver, 15).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "sec-cpt-if")))

        # Inject the token into the inner HTML of g-recaptcha-response and invoke the callback function
        driver.execute_script(f'document.getElementById("g-recaptcha-response").innerHTML="{captcha_key}"')
        driver.execute_script(f"verifyAkReCaptcha('{captcha_key}')") # This step fails in Python but runs successfully in the console
    except TimeoutException:
        print("Captcha was solved without needing to invoke the callback function. Bypassing this part of the script to prevent raising an error")

    # Wait for 0.5 seconds until the page is loaded
    time.sleep(0.5)

# Instantiate the Chrome web driver
driver = webdriver.Chrome()

# Solve the captcha
captcha_token = solve_captcha(sitekey=sitekey, url=base_url)
# Invoke the callback function
invoke_callback_func(driver=driver, captcha_key=captcha_token)
oelmaria
  • 51
  • 7