-1

I am trying to web scrape the dollar sign rating for each restaurant on a food delivery website, however, there is no available xpath.

<!-- react-text: 2108 -->
"$$"
<!-- /react-text -->

The above code is what is used for the dollar ratings from when I inspected the website. I've tried using the line directly above:

    <i class="icon-bullet--small">·</i>

However, this outputs the period since it is not for the dollar rating. I've also tried using:

    cost = ['//li[{}]/a/div[2]/p[2]/!'.format(x) for x in range(1, 999)]

as well as using "!--" and "react" and "react-text" in the xpath, but none of it works. Any suggestions on how to approach this?

Mimi Chung
  • 97
  • 7

1 Answers1

5

This XPath,

//comment()[normalize-space() = "react-text: 2108"]/following-sibling::text()

will select the text node immediately following the targeted comment, returning

"$$"

as requested.


Important note: @DebanjanB has helpfully pointed out that the comment containing react-text: 2108 is a React directive that Selenium won't see unless the content is extracted as page_source. Thanks, Debanjan!

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Thanks for your answer! I'm very new to Python, so I unsuccessfully attempted cost = ['//comment()[. = "react-text: {}"]'/following-sibling::text()] for costrating in cost[:999]: dollarrating = driver.find_element_by_xpath(costrating).text print(dollarrating) – Mimi Chung Jan 30 '19 at 01:41
  • See [How to use Xpath in Python?](https://stackoverflow.com/questions/8692/how-to-use-xpath-in-python). No use repeating that all here. The interesting part of your question is how to write an XPath relative to comments. – kjhughes Jan 30 '19 at 01:52
  • If the comment content you want get has a parent node, you can also get it by `//parent-node/text()` – jia Jimmy Jan 30 '19 at 02:28
  • 2
    @kjhughes `react-text: 2108` indicates it's a react element (generated dynamically). Selenium won't be able to recognize/interact with this content unless extracted as `page_source` – undetected Selenium Jan 30 '19 at 07:21