0

For example:

html = "
<ul>
     <li class="item-0"><a href="link1.html">first item</a></li>
     <li class="item-1"><a href="link2.html">second item</a></li>
     <li class="item-inactive"><a href="link3.html">third item</a></li>
     <li class="item-1"><a href="link4.html">fourth item</a></li>
     <li class="item-0"><a href="link5.html">fifth item</a>
</ul>"
soup = BeautifulSoup(html)
item_0 = soup.select_one('ul li.item-0')

Is there a function like soup_to_xpath(item_0) that can translate item_0 to '/html/body/ul/li[5]' or something like this?

2 Answers2

0

AFAIK there is no possibility to work with XPath in bs4.

Provided CSS selector could be "translated" into XPath as:

//ul/li[@class="item-0"]

or

//li[.="fifth element"]

You can use below code:

from lxml import etree
from lxml.cssselect import CSSSelector # You might need to run "pip install cssselect"

sel = CSSSelector('ul li.item-0')
sel.path

Output:

"descendant-or-self::ul/descendant-or-self::*/li[@class and contains(concat(' ', normalize-space(@class), ' '), ' item-0 ')]"
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • I jsut want to get xpath. What does "Provided CSS selector could be "translated" into XPath" mean? – user7391482 Mar 03 '17 at 06:03
  • `'ul li.item-0'` is a `CSS` selector. Do you want to use `BeautifulSoup` to generate `XPath` or any other automation tool to generate it? Clarify your issue – Andersson Mar 03 '17 at 06:05
  • "Do you want to use BeautifulSoup to generate XPath or any other automation tool to generate it?" This is exactly what I want... – user7391482 Mar 03 '17 at 06:08
  • You can try to use `lxml.etree` (http://lxml.de/tutorial.html) to generate absolute `XPath` expression as described here http://stackoverflow.com/questions/24411765/how-to-get-an-xpath-from-selenium-webelement-or-from-lxml – Andersson Mar 03 '17 at 06:13
  • `tree.getpath(element)` Here the element seems to be a lxml object, I want to translate a bs element to xpah, the same demand as the quizzer. – user7391482 Mar 03 '17 at 06:22
0

lxml use cssselect module to perform such task:

In [1]: from cssselect import GenericTranslator, SelectorError

In [2]: expression = GenericTranslator().css_to_xpath('ul li.item-0')

In [3]: expression
Out[3]: "descendant-or-self::ul/descendant-or-self::*/li[@class and contains(concat(' ', normalize-space(@class), ' '), ' item-0 ')]"
宏杰李
  • 11,820
  • 2
  • 28
  • 35