1

I want to access a website and fetch information using Selenium. I have passed the web URL using a list (vro_list) and fetched information from each URL in a list (roe_list). Website links I'm accessing: https://www.valueresearchonline.com/stocks/44052/reliance-industries-ltd https://www.valueresearchonline.com/stocks/44811/tata-consultancy-services-ltd

Here is the code I have used -

def fetch_roe(link):

    URL = link
    browser.get(URL)
    browser.maximize_window()
    time.sleep(5)
    
    roe = browser.find_element('xpath', '/html/body/section[2]/div/div/div[1]/div/div[4]/section[1]/div[1]/div/div[2]/div[2]/div/div/div[2]/table/tbody/tr[2]/td[2]/div')

    return roe.text


roe_list = []

option = Options()
option.add_argument("start-maximized")
option.binary_location = brave_path
browser = webdriver.Chrome(executable_path=driver_path, options=option)

for url_link in vro_list:
    print(url_link)
    roe_item = fetch_roe(url_link)
    roe_list.append(roe_item)
    time.sleep(5)

browser.quit()
    
print(roe_list)

When I run this code, I receive an error saying -

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/section[2]/div/div/div1/div/div[4]/section1/div1/div/div[2]/div[2]/div/div/div[2]/table/tbody/tr[2]/td[2]/div"}

I checked the website opened using Selenium and found this -

enter image description here

It seems like the website is blocking the access of the Selenium web driver. You can also see 'Brave is being controlled by automated software.' This happened on the second iteration of the for loop. The first iteration ran fine and I got the desired result.

How can I bypass this to fetch the required information? Please help

I'm also sharing the error message received on the console of Jupyter -

NoSuchElementException                    Traceback (most recent call last)
Input In [29], in <cell line: 8>()
      8 for url_link in vro_list:
      9     print(url_link)
---> 10     roe_item = fetch_roe(url_link)
     11     roe_list.append(roe_item)
     12     time.sleep(5)

Input In [27], in fetch_roe(link)
      6 time.sleep(5)
      8 #name = browser.find_element('xpath', '/html/body/div[3]/h1/span')
----> 9 roe = browser.find_element('xpath', '/html/body/section[2]/div/div/div[1]/div/div[4]/section[1]/div[1]/div/div[2]/div[2]/div/div/div[2]/table/tbody/tr[2]/td[2]/div')
     11 return roe.text

File ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py:855, in WebDriver.find_element(self, by, value)
    852     by = By.CSS_SELECTOR
    853     value = '[name="%s"]' % value
--> 855 return self.execute(Command.FIND_ELEMENT, {
    856     'using': by,
    857     'value': value})['value']

File ~\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py:428, in WebDriver.execute(self, driver_command, params)
    426 response = self.command_executor.execute(driver_command, params)
    427 if response:
--> 428     self.error_handler.check_response(response)
    429     response['value'] = self._unwrap_value(
    430         response.get('value', None))
    431     return response

File ~\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py:243, in ErrorHandler.check_response(self, response)
    241         alert_text = value['alert'].get('text')
    242     raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 243 raise exception_class(message, screen, stacktrace)

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/section[2]/div/div/div[1]/div/div[4]/section[1]/div[1]/div/div[2]/div[2]/div/div/div[2]/table/tbody/tr[2]/td[2]/div"}
  (Session info: chrome=105.0.5195.102)
Stacktrace:
Backtrace:
    Ordinal0 [0x00A2C0A3+2212003]
    Ordinal0 [0x009C2CC1+1780929]
    Ordinal0 [0x008D465D+804445]
    Ordinal0 [0x00903475+996469]
    Ordinal0 [0x0090363B+996923]
    Ordinal0 [0x00931382+1184642]
    Ordinal0 [0x0091EC64+1109092]
    Ordinal0 [0x0092F5B2+1177010]
    Ordinal0 [0x0091EA36+1108534]
    Ordinal0 [0x008F83C9+951241]
    Ordinal0 [0x008F9396+955286]
    GetHandleVerifier [0x00CD9CE2+2746722]
    GetHandleVerifier [0x00CCA234+2682548]
    GetHandleVerifier [0x00ABB34A+524234]
    GetHandleVerifier [0x00AB9B60+518112]
    Ordinal0 [0x009C9FBC+1810364]
    Ordinal0 [0x009CEA28+1829416]
    Ordinal0 [0x009CEB15+1829653]
    Ordinal0 [0x009D8744+1869636]
    BaseThreadInitThunk [0x76A4FA29+25]
    RtlGetAppContainerNamedObjectPath [0x77C07A9E+286]
    RtlGetAppContainerNamedObjectPath [0x77C07A6E+238]

3 Answers3

1

This is one way of accessing the information from those pages (setup is on linux, you need a working setup for your system) - I'm just printing out all tables, you can do your own stuff:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time as t
import pandas as pd
import undetected_chromedriver as uc


options = uc.ChromeOptions()
# options.add_argument("--no-sandbox")
# options.add_argument('--disable-notifications')
options.add_argument("--window-size=1280,720")

browser = uc.Chrome(options=options)

actions = ActionChains(browser)
wait = WebDriverWait(browser, 20)

url_list = ['https://www.valueresearchonline.com/stocks/44052/reliance-industries-ltd', 'https://www.valueresearchonline.com/stocks/44811/tata-consultancy-services-ltd']

for url in url_list:
    browser.get(url)
    print(url)
    dfs = pd.read_html(browser.page_source)
    for df in dfs:
        display(df)
    t.sleep(5)

Result printed in terminal:

https://www.valueresearchonline.com/stocks/44052/reliance-industries-ltd
Unnamed: 0  YTD 1 Month 3 Months    1 Year  3 Years 5 Years 10 Years
0   Reliance    9.62    0.19    -4.36   7.02    28.52   26.22   20.82
1   S&P BSE Sensex  3.18    1.30    10.68   3.09    17.27   13.52   12.91
2   #   --  --  --  --  --  --  --
Unnamed: 0  2021    2020    2019    2018    2017    2016    2015
0   Reliance    19.15   32.76   35.06   23.25   70.19   6.60    14.27
1   S&P BSE Sensex  21.99   15.75   14.38   5.87    27.91   1.95    -5.03
2   S&P BSE Sensex  21.99   15.75   14.38   5.87    27.91   1.95    -5.03
Unnamed: 0  Stock   Peer Median Unnamed: 3
Unnamed: 0  Stock   Peer Median Unnamed: 3
0   P/E 26.18   16.17   Created with Highcharts 9.2.2
1   P/B 2.18    1.19    Created with Highcharts 9.2.2
2   Dividend Yield  0.31    2.72    Created with Highcharts 9.2.2
Unnamed: 0  Stock   Peer Median Unnamed: 3
Unnamed: 0  Stock   Peer Median Unnamed: 3
0   TTM EPS YoY change (%)  32.73   3.03    Created with Highcharts 9.2.2
1   Returns on Equity   9.63    11.12   Created with Highcharts 9.2.2
2   Piotroski F-Score   7.00    --  NaN
https://www.valueresearchonline.com/stocks/44811/tata-consultancy-services-ltd
Unnamed: 0  YTD 1 Month 3 Months    1 Year  3 Years 5 Years 10 Years
0   Tata Consultancy Services   -13.38  -5.39   -3.63   -14.60  14.55   21.41   16.61
1   S&P BSE Sensex  3.20    1.32    10.70   3.10    17.27   13.52   12.91
2   S&P BSE IT  -21.61  -2.99   0.44    -13.56  23.07   24.54   17.26
Unnamed: 0  2021    2020    2019    2018    2017    2016    2015
0   Tata Consultancy Services   27.66   32.07   13.61   43.11   14.19   -2.10   -4.27
1   S&P BSE Sensex  21.99   15.75   14.38   5.87    27.91   1.95    -5.03
2   S&P BSE IT  56.07   56.68   9.84    24.78   10.83   -8.00   4.51
Unnamed: 0  Stock   Peer Median Unnamed: 3
Unnamed: 0  Stock   Peer Median Unnamed: 3
0   P/E 30.34   23.56   Created with Highcharts 9.2.2
1   P/B 11.99   3.95    Created with Highcharts 9.2.2
2   Dividend Yield  1.34    0.96    Created with Highcharts 9.2.2
Unnamed: 0  Stock   Peer Median Unnamed: 3
Unnamed: 0  Stock   Peer Median Unnamed: 3
0   TTM EPS YoY change (%)  14.01   19.21   Created with Highcharts 9.2.2
1   Returns on Equity   41.85   19.02   Created with Highcharts 9.2.2
2   Piotroski F-Score   7.00    --  NaN

For undetected_chromedriver documentation: https://pypi.org/project/undetected-chromedriver/ And for Selenium documentation, visit https://www.selenium.dev/documentation/

Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30
1

As @Aadarsha mentioned, you problem is Hcaptcha. You will need to write code to detect if it is an Hcaptcha page and then write code to solve the Hcaptcha. You can solve Hcaptcha using 2captcha or there are several provides out there and that is the easy part. Getting the page to recognize the token is a another deal.

So when you request 2captcha to solve the captcha, they will return a JSON Token.. which is a long string of letters and numbers. You will need to apply that Token to the page in a way that the page accepts it.

Solution 1) If you can use Puppeteer instead of selenium then check the solution with the YouTube video on this page The discord register system with Python gives an error "Invalid-response" .

Solution 2) Locate the callback from the site and then call the callback in your script. (Without the page-source its difficult to tell you what it is)

Solution 3) Write your own function to send the data across. For this you can use Chrome. Open up the network tab, clear our any current transactions and then solve the captcha manually. Then check the Network tab for something like a 404 and it should have a PAYLOAD. then write your function to encapsulate that payload and send it across. I just went through this exercise on another site and finally ended up using a provider from fiverr.com

Hopefully one of the solutions work for you. Good Luck.

hdsouza
  • 354
  • 4
  • 17
0

The error message is because, the intended element is not being found due to bot detection.

Unfortunately, there is no mechanism to bypass hCaptcha. You can use captcha solving services like 2captcha.

But, you could also try modules like undetected-chromedriver to see if bypassing captcha works.

Aadarsha
  • 176
  • 1
  • 13