0

I'm trying to scrape this page : https://www.bitmex.com/app/trade/XBTUSD to get the Open Interest data on the left side of the page. I am at this stage

import bs4
from bs4 import BeautifulSoup
import requests
import re
from selenium import webdriver
import urllib.request

r = requests.get('https://www.bitmex.com/app/trade/XBTUSD')
url = "https://www.bitmex.com/app/trade/XBTUSD"
page = urllib.request.urlopen('https://www.bitmex.com/app/trade/XBTUSD')
soup = bs4.BeautifulSoup(r.text, 'xml')
resultat = soup.find_all(text=re.compile("Open Interest"))


driver = webdriver.Firefox(executable_path='C:\\Users\\Samy\\Desktop\\geckodriver\\geckodriver.exe')
results = driver.find_elements_by_xpath("//*[@class='contractStats hoverContainer block']//*[@class='value']/html/body/div[1]/div/span/div[1]/div/div[2]/li/ul/div/div/div[2]/div[4]/span[2]/span/span[1]")
print(len(results))

I get 0 as a result. I tried several different things for the results variable (also driver.find_elements_by_xpath("//span[@class='price']/text()"), but can't seem to find the way. I know the problem is when I copy the XML path, but can't seem to understand clearly the issue despite reading Why does this xpath fail using lxml in python? and https://stackoverflow.com/a/43095252/7937578

I was using only the XML path obtained by copying, but after reading those SO questions I added the part at the begining[@class....] but I'm missing something. Thank you if you know how to help !

Sam99
  • 25
  • 6

3 Answers3

0

I don't know why it fails, but I think the best way to find any element is by full XPath.

Something that look like this:

homebutton = driver.find_element_by_xpath("/html/body/header/div/div[1]/a[2]/span")

Give it a try.

Alan Cesar
  • 370
  • 3
  • 13
0

Full path is not the best one, also it's harder to read it. The XPath is 'filter', try to find some unique attributes for needed control, or some unique description of parent one. Look, the needed span has 'value' class, and it is located inside span with 'tooltipWrapper' class, also the parent span has another child with 'key' class and 'Open Interest' text. There are thousands of locators, I can saggest two:

//span[@class = 'tooltipWrapper' and span[string() = 'Open Interest']]//span[@class = 'value']
//span[@class = 'key' and text() = 'Open Interest']/..//span[@class = 'value']
  • Thanks for your answer, I tried both your suggestions in my results variable but neither worked for me – Sam99 Jul 01 '20 at 15:12
  • I've checked them both, they are valid. Please note, this control is hidden time to time, it is replaced with another one. – Sergii Dmytrenko Jul 01 '20 at 16:52
0

If I understood your requirements correctly, the following script should fetch you the required content from that page:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

link = "https://www.bitmex.com/app/trade/XBTUSD"

with webdriver.Firefox() as driver:
    driver.get(link)
    wait = WebDriverWait(driver,10)
    items = [item.text for item in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//*[@class='lineItem']/span[@class='hoverHidden'][.//*[contains(.,'Open Interest')]]//span[@class='key' or @class='value']")))]
    print(items)

Output at his moment:

['Open Interest', '640,089,423 USD']
asmitu
  • 175
  • 11
  • Thanks @asmitu ! When I run your code it works fine but the result that is printed is `['Open Interest', '654\u202f715\u202f936 USD']`. But besides this display issue, I noticed you used a different structure of code compared to what I was trying, can you explain what you did please ? or share a link to where you learned it ? Thanks a lot ! – Sam99 Jul 01 '20 at 15:11
  • [A good place](https://www.w3schools.com/xml/xpath_intro.asp) to start with. – SIM Jul 01 '20 at 15:35