Python: Xpath Issue getting value for each DIV in a For loop

Question

I'm using Python 3.6 to process a chunk of HTML, the issue I'm having is that the code below for the loop is working but the atag.xpath query is searching the whole HTML source and returning all four tag values for data-size.

What I'm trying to do is that when PAGE_RAWis processed for the for loop that for every instance of a DIV containing a class of item that it will find the child DIV with a class of padding and pull out the data-size attribute for that one tag and not all the tags if finds in the HTML source.

HTML

<div class="item">
    <div class="padding" data-size="12"></div>
</div>
<div class="item">
    <div class="padding" data-size="13"></div>
</div>
<div class="item">
    <div class="padding" data-size="14"></div>
</div>
<div class="item">
    <div class="padding" data-size="15"></div>
</div>

Code

import lxml.html as LH
...

PAGE_RAW = driver.page_source
PAGE_RAW = LH.fromstring(PAGE_RAW)

for atag in PAGE_RAW.xpath("//div[contains(@class, 'item')]"):
    data = atag.xpath("//div[contains(@class, 'padding')]/@data-size")

score 5 · Accepted Answer · answered Apr 27 '17 at 00:38

5

The problem you're facing here is that in your second xpath, the // is telling it to search anywhere in the document (it doesn't matter if the current node is a specific div, it always searches from start).

To find any nodes under the current node, replace // with .// (the . indicates that the search starts with the current node, not the root).

import lxml.html as LH
...

PAGE_RAW = driver.page_source
PAGE_RAW = LH.fromstring(PAGE_RAW)

for atag in PAGE_RAW.xpath("//div[contains(@class, 'item')]"):
    data = atag.xpath(".//div[contains(@class, 'padding')]/@data-size")

answered Apr 27 '17 at 00:38

araraonline

1,502
9
14

Whats the difference between `.//` and `./` or is there any? – llanato Apr 27 '17 at 07:07
1

While `.//` will match any descendants (or self), `./` will match only children. – araraonline Apr 27 '17 at 15:00
1

Check out the second answer here, it's well explained: http://stackoverflow.com/questions/35606708/what-is-the-difference-between-and-in-xpath – araraonline Apr 27 '17 at 15:03

Python: Xpath Issue getting value for each DIV in a For loop

1 Answers1