1

I have this HTML document, I want to XPath in headless Chrome to simulate click of all PDF links. For that I should find all "href" that contains: documents , budget-2020-21 OR budget-2020-2021 also ends with .pdf

Here is an example HTML markup:

<a href="https://www.website.com/documents/7-2045/budget-address-budget-2020-21-en.pdf"
<a href="https://www.website.com/documents/7-2045/crown-corporation-business-plans-budget-2020-21-en.pdf"
<a href="https://www.website.com/documents/7-2045/estimates-supplementary-detail-budget-2020-21-en.pdf" 
<a href="https://www.website.com/documents/7-2045/budget-2020-21-government-business-plan.pdf" 
<a href="https://www.website.com/documents/7-2045/highlights-budget-2020-21-en.pdf"
<a href="https://www.website.com/documents/7-2045/presentation-slides-budget-2020-21-en.pdf" 
<a href="https://www.website.com/sites/default/files/documents/6-2046/ftb-bfi-041-en-budget-2020-2021.pdf">

I used this XPath expression:

//*[contains(@href,’budget-2020-21 OR budget-2020-2021’)]

It seems OR is not correctly used. Please help.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
tursunWali
  • 71
  • 8

1 Answers1

1

Your XPath is selecting all elements with an attribute value that contains the substring, 'budget-2020-21 OR budget-2020-2021', literally.

If you want all elements with an attribute value that contains the substring, 'budget-2020-21' or 'budget-2020-2021'

//*[contains(@href,'budget-2020-21') or contains(@href,'budget-2020-2021')]

Note also that you must use single quote, ', or double quote, ", characters to delimit the string literals, not grave accent, , as you have in the XPath in your question.

See also

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Sorry , It seems your Xpath expression didn't highlight those pdf links: //*[contains(@href,'budget-2020-21') or contains(@href,'budget-2020-2021')] – tursunWali Jan 20 '21 at 07:35
  • It most certainly does select those `a` elements, once you fix them to be well-formed. (Scroll to the right to see that the `a` elements you posted are missing `>` and ``.) Then, you should be able to add a `[ ]` filter to require ending with `".pdf"` using the how-to link I provided. – kjhughes Jan 20 '21 at 14:09
  • 1
    Please [**accept**](https://meta.stackoverflow.com/q/5234/234215) this answer if it's solved your problem. If not, please follow-up specifically so any outstanding concerns can be addressed. Thanks. – kjhughes Jan 26 '21 at 14:20
  • @kjhughes Your answer looks correct, so that is why I upvoted it back in January. I have decided to start taking the position of your last comment on my answers. – Life is complex Feb 02 '21 at 19:38
  • @Lifeiscomplex: Thanks for the upvote, but I'm confused: Is tursunWali another account of yours? (If not, I'm wondering why you're responding here.) I was guiding tursunWali to accept this answer if it resolves the issue or follow-up with a comment explaining how the problem remains if unresolved. – kjhughes Feb 02 '21 at 19:46
  • I responded because I always walk through the history of other posts for an OP that I have answered a question for. I do this, because I'm trying to understand an OP, so I can response to their question better. – Life is complex Feb 02 '21 at 20:53