I'm using Python 3.6 to process a chunk of HTML, the issue I'm having is that the code below for the loop is working but the atag.xpath
query is searching the whole HTML source and returning all four tag values for data-size
.
What I'm trying to do is that when PAGE_RAW
is processed for the for
loop that for every instance of a DIV containing a class of item
that it will find the child DIV with a class of padding
and pull out the data-size
attribute for that one tag and not all the tags if finds in the HTML source.
HTML
<div class="item">
<div class="padding" data-size="12"></div>
</div>
<div class="item">
<div class="padding" data-size="13"></div>
</div>
<div class="item">
<div class="padding" data-size="14"></div>
</div>
<div class="item">
<div class="padding" data-size="15"></div>
</div>
Code
import lxml.html as LH
...
PAGE_RAW = driver.page_source
PAGE_RAW = LH.fromstring(PAGE_RAW)
for atag in PAGE_RAW.xpath("//div[contains(@class, 'item')]"):
data = atag.xpath("//div[contains(@class, 'padding')]/@data-size")