i've a strange behaviour in using lxml.etree.HTML

Question

i'm unable to understand why in the below code:

i put all the parsed page in x (ok)
i create in l1 a list of items i'm interesting in with x.xpath (ok)
i cycle through the elements in l1 assigning every single element to l2
i try to find in l3 an element i'm interesting in with l2.xpath but I DON'T UNDERSTAND why i don't obtain in l3 only a list of one element, the one i should found in l2 element, but a list with all elements i would find if i would write x.xpath, in other words sub-elements from the entire page and not only inside the l2 element

Someone know the reason and can explain me the solution? Thank you.

from lxml.etree import HTML
from requests import Session

z = Session()
z.params = {"filter[entry_type]": "business"}
x = z.get("https://www.local.ch/it/q", params={"where": 'Chiasso'}).text
x = HTML(x)
l1 = x.xpath("//*[starts-with(@class, 'js-entry-card-container')]")
for l2 in l1:
   l3 = l2.xpath("//*[starts-with(@class, 'lui-margin-vertical-zero')]")

Could you please a minimal sample for input file to reproduce the problem. — Pouya Esmaeili, Jul 16 '21 at 19:16
Sorry, i forget always something. You can simply create a text file named Svizzera.txt with a line with "Chiasso" and a second blank line. — Marco Ocram, Jul 16 '21 at 19:20
Why not hardcode that string in the question instead of having the code try to read a file at all, then? A good [mre] is the shortest possible thing that can be run _without changes_ to see the same problem; needing to create an input file is a change. — Charles Duffy, Jul 16 '21 at 19:24
Better, indeed. The only thing that's not great about this as-currently-written is that one has to inspect the content of the linked page before the question is understandable (and answers can be learned from)-- an ideal knowledge base entry will last indefinitely, even if off-site resources change / links break / etc. — Charles Duffy, Jul 16 '21 at 19:56
Anyhow, in terms of the problem you're having -- `//*` is different from `.//*`; if you want to search only from `l1`, you need to start with `.//` and not `//` (which explicitly tells the xpath interpreter to search the whole page, not just from the current element). — Charles Duffy, Jul 16 '21 at 19:56
...so even though you aren't using selenium, this is the same problem as the one encountered in https://stackoverflow.com/questions/14049983/selenium-webdriver-finding-an-element-in-a-sub-element — Charles Duffy, Jul 16 '21 at 19:58
@CharlesDuffy i don't know use of `.//*` was different from `//*` if i start from a sub element with lxml.etree. now i know and the problem is solved. instead i understand with selenium is different. really i've used more it. thanks. and about the code this is the minimal example, the whole code for the data base construction is different. — Marco Ocram, Jul 16 '21 at 20:09

i've a strange behaviour in using lxml.etree.HTML

0 Answers0