3

I am trying to collect information from a webpage and cannot get the correct XPath to find it. Here is a piece from a website:

<div class="posted">
  <div>
    June 20, 2018
  </div>
</div>

I want to search each page for this divide class that says "posted", then return everything under it as a string. (A messy string is ok; I will just use "if "2018" in "possibleDate"" to search for the year) Here is what I am trying:

possibleDate = str(tree.xpath("//div[contains(@class, ’posted’)]//@text"))

It says that it is an invalid expression.
What am I doing wrong?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Note that `[contains(@class, 'posted')]` is not wrong, but I suspect you intended `[@class = 'posted']`. The "contains" version will match `@class="signposted"`; the "=" version won't. – Michael Kay Jun 26 '18 at 21:26

1 Answers1

1

First, replace the characters with ' characters surrounding posted.

Next, replace @text with text() to eliminate your XPath syntax error.

Also, you might want to use the space normalized string value of the selected div rather than selecting text nodes:

possibleDate = str(tree.xpath("normalize-space(//div[@class='posted'])")

This will abstract across mark-up variations nested within the targeted div.

See also: xpath: find a node whose class attribute matches a value and whose text contains a certain string

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Replacing `@text` with `text()` still returns an invalid expression error. using `possibleDate = str(tree.xpath("normalize-space(//div[@class='posted'])"))` did not give an error, but did not find anything. – George Sonancia Jun 26 '18 at 18:52
  • Ah, you also have to replace the `’` characters with `'` characters surrounding `posted` in your XPath. Answer updated. – kjhughes Jun 26 '18 at 18:58
  • Thanks. I've tested `tree.xpath("//div[contains(@class, 'posted')]//text()")`, `tree.xpath("normalize-space(//div[@class='posted'])")`, and `tree.xpath("//div[contains(@class, 'posted')]")`, but all just return empty strings. I am sure the pages they check contain the appropriate class, but they still can't find them. – George Sonancia Jun 26 '18 at 19:07
  • You'll need to update your question with a **true** ***[mcve]*** in order for us to help you further. – kjhughes Jun 26 '18 at 19:19