1

Using the code bellow I wanted to extract gold price by using xpath and then use liner regression to do basic predictions.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from sklearn.linear_model import LinearRegression
import time
import numpy as np
from sklearn.svm import SVR
import pytz
from datetime import datetime
from sys import argv
import os, psutil

################################################
if len(argv) != 5:
  print (argv[0] + '<train count> <timeout(s)> <predict date(Y/M/D)> <predict clock(H:M:S)>')
  sys.exit(2)
X_predict = [(int(datetime.strptime(argv[3] + " " + argv[4], '%Y/%m/%d %H:%M:%S').timestamp()*(10000000)))]
################################################

X=[]
y=[]
#driver = webdriver.Chrome()
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://goldprice.org/live-gold-price.html')

elem_xpath = '//[@id="gpxtickerLeft_price"]'

for i in range(1, int(argv[1])):
    try:
        elem = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, elem_xpath)))
        print ("train => ", i)
        X.append(int(time.time()*(10000000)))
        y.append(int(elem.text.replace(',', '')))
        time.sleep(int(argv[2]))
    finally:
        driver.quit
        
##############################################
X = np.array(X).reshape(-1, 1)
y = np.array(y).reshape(-1, 1)
X_predict = np.array(X_predict).reshape(-1, 1)
##############################################
    
svr_rbf = LinearRegression()
y_rbf = svr_rbf.fit(X,y).predict(X_predict)

##########################################
#print ('X:'.format(X))
#print ('y:'.format(y))
#print ('X_predict:{}'.format(X_predict))
##########################################

print ('y_rbf: {}'.format(int(y_rbf)))
print('memory usage: {} MB'.format(
int(psutil.Process(os.getpid()).memory_info().rss/1024/1024)
)) 


But after executing the script I get the following error:


C:\Users\Lev\Desktop>python mls.py 6 3 2020/12/11 12:43:06


[WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 90.0.4430
[WDM] - Get LATEST driver version for 90.0.4430
[WDM] - Driver [C:\Users\Lev\.wdm\drivers\chromedriver\win32\90.0.4430.24\chrome
driver.exe] found in cache

DevTools listening on ws://127.0.0.1:6275/devtools/browser/10d6bc25-3034-4ca7-a4
37-c0cf39c86274
[4412:5028:0514/123522.805:ERROR:device_event_log_impl.cc(214)] [12:35:22.805] F
IDO: webauthn_api.cc:54 Windows WebAuthn API failed to load
[5524:4228:0514/123532.459:ERROR:ssl_client_socket_impl.cc(947)] handshake faile
d; returned -1, SSL error code 1, net_error -100
[5524:4228:0514/123533.786:ERROR:ssl_client_socket_impl.cc(947)] handshake faile
d; returned -1, SSL error code 1, net_error -100
[5524:4228:0514/123538.624:ERROR:ssl_client_socket_impl.cc(947)] handshake faile
d; returned -1, SSL error code 1, net_error -100
[5524:4228:0514/123538.825:ERROR:ssl_client_socket_impl.cc(947)] handshake faile
d; returned -1, SSL error code 1, net_error -100
Traceback (most recent call last):
  File "mls.py", line 32, in <module>
    elem = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.
XPATH, elem_xpath)))
  File "D:\Python38\lib\site-packages\selenium\webdriver\support\wait.py", line
80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

The line "[5524:4228:0514/123533.786:ERROR:ssl_client_socket_impl.cc(947)] handshake faile d; returned -1, SSL error code 1, net_error -100" is just keeps getting spammed.

I guess the Xpatch is wrong.

Sam Borov
  • 37
  • 4

4 Answers4

1

XPath should be

//span[@id="gpxtickerLeft_price"]

You used:

//[@id="gpxtickerLeft_price"]

The part with the [] is called the predicate. See this page for some example

It needs a node or attribute to filter on. // is not a node.

Node examples:

//div
//*
//text()
//@id
Siebe Jongebloed
  • 3,906
  • 2
  • 14
  • 19
1

Yes, your xpath is missing a tag name.
So it should be //span[@id="gpxtickerLeft_price"] or //*[@id="gpxtickerLeft_price"]

Prophet
  • 32,350
  • 22
  • 54
  • 79
1

this id gpxtickerLeft_price represents three webelement (not all but 2 have prefix). you have two options now :

  1. Use find_elements

  2. Write 3 different locators for web elements.

Code :

elem = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, 'gpxtickerLeft_price')))

read more why xpath is less prefer over ID here

cruisepandey
  • 28,520
  • 6
  • 20
  • 38
  • Thanks but I get the value error now: Traceback (most recent call last): File "mls.py", line 35, in y.append(int(elem.text.replace(',', ''))) ValueError: invalid literal for int() with base 10: '1833.27' – Sam Borov May 14 '21 at 08:28
  • How can I convert that to solve the issue? – Sam Borov May 14 '21 at 08:28
  • @SamBorov : that is not because of ID what I have given you. that is because of `.append()`, raise a different ticket for that. – cruisepandey May 14 '21 at 10:21
1

I think the matter is with your browser driver version also. In the logs, I can see that you have google chrome version:: 90.0.4430, but the chromedriver version is old.

  1. Please try removing this chromedriver.exe version by going to your command prompt and running the command :: taskkill /F /IM chromedriver. exe.

  2. Then install new chromedriver.exe from here (depending upon your machine).

  3. Use it in your code.

mr-possible
  • 109
  • 1
  • 9