3

I know how to find a webelement using XPATH like:

fruit = webdriver.find_element(By.XPATH, '/div/div[1]/div[2]').text

Output 
fruit = 'banana'

But what I really want is to do the reverse:

banana_path = webdriver."someway get the XPATH"(text = 'banana')

Output 
banana_path = '/div/div[1]/div[2]'

I want to do this because first I scrape all the times that have in the site, so that when one is equals to 10 (for example) I go back to the site and scrape the text that matches it. Unfortunately, there are dozens of pieces of information (with the same name for the class) that keep increasing or decreasing according to demand. That's why I need to get XPATH, because with it I would be able to go directly to what I want to find.

For example if I got that XPATH of Time:

time_path = '/div[1]/div/div/div/div/div[1]/div[1]/div[2]/div[3]'

I could find and scrape text that has an XPATH that is a near position

webdriver.find_element(By.XPATH, '/div[1]/div/div/div/div/div[1]/div[1]/span/div').text

I found a answer about that in stack overflow, but I'm using Python and not JavaScript.

Find an element by text and get xpath - selenium webdriver junit

I also found this answer teaching how to do that with urllib2 and lxml, however I'm entering a site where its protection against automation is strong and I was only able to enter with Selenium.

How to get an XPath from selenium webelement or from lxml?

I really appreciate your help because this is the last missing part of my automation

RaymanSix
  • 43
  • 5

1 Answers1

2

I got your problem I used selenium and lxml as you already told to use the both module. I don't know that my method will work properly or not because i use lxml part from the second link of your question How to get an XPath from selenium webelement or from lxml?

so here is my approach

#First get website data using selenium 

from selenium import webdriver

url = ''

driver = webdriver.Chrome('path/to/driver')
driver.get(url)

data = driver.page_source()

#then get your xpath using lxml because you aleready have the data above

from lxml import etree

xpath = ''

tree = etree.parse(data)
element = tree.xpath(xpath)[0]
print(tree.getpath(element))
CYCNO
  • 88
  • 8
  • I think this is the way to go. Once I get the data, I can use lxml. But, I tried to use your code exactly as you said and this error appeared : `
    – RaymanSix Jan 02 '23 at 22:05
  • @RaymanSix Can You Give me the website link so i can use another way acording to website – CYCNO Jan 03 '23 at 05:34
  • Sorry for the delay. After researching and even asking a question on stackoverflow, I found out that the problem wasn't with the website, but with the method you're telling me to use. – RaymanSix Jan 06 '23 at 03:03
  • Instead of using etree, the correct thing is to use html, so that lxml can parse the site. Like that `tree = html.parse(data)`. However, the html method doesn't have the xpath, path or getpath functions, so at this time, I don't know what I can do. – RaymanSix Jan 06 '23 at 03:09
  • @RaymanSix sorry for late response but the that next nethod to parse html you are telling is not my method i just copy and paste the method which you already tell to use from your second link of stackoverflow – CYCNO Jan 10 '23 at 15:19