how to use contains and not contains on different classes in xpath

Question

I'm strugling with this simple code.

<div id="post_message_975824" class="alt3">
   <div class="quote">
      some unwanted text 
   </div>
   the text to get <abr>ABR</abr> text to get
</div>

and I want to get this worked:

xpath = "//*[contains(@id, 'post_message_') and not(contains(@class,'quote'))]"

but this fails. I was trying to use some another query but not sure what I'm doing wrong?

EDIT

I found his code works: xpath = "//*[contains(@id,'post_message_')//div[not(contains(@class,'quote'))]"

but it doesn't select the desired text when there's no quote subclass in the html.

The idea is to get all text from all subnodes also but not from those restricted.

score 2 · Accepted Answer · answered Apr 29 '17 at 01:44

Try this xpath :

//div[contains(@id,'post_message_')]/text() | //div[contains(@id,'post_message_')]/*[not(contains(@class,'quote'))]/text()

The first part of xpath //div[contains(@id,'post_message_')]/text() gives the text under the parent div i.e. <div id="post_message_975824" class="alt3">

The second part of xpath //div[contains(@id,'post_message_')]/*[not(contains(@class,'quote'))]/text() gives the text under all its child nodes only if the child doesn't contain an attribute called class with value quote

The result on your example is :

   the text to get 
ABR
 text to get

score 0 · Answer 2 · answered Apr 29 '17 at 01:32

Why not just remove all the nodes you don't want?

library(xml2)

doc <- read_xml('<div id="post_message_975824" class="alt3">
   <div class="quote">
      some unwanted text 
   </div>
   the text to get <abr>ABR</abr> text to get
</div>')

xml_find_all(doc, ".//div[@class='quote']") %>% xml_remove()

how to use contains and not contains on different classes in xpath

2 Answers2