Python 3.4.0 -- xpath -- gets me empty list

Question

Trying to retrieve data from Ukrainian dictionary online everything works fine with:

url= "http://www.toponymic-dictionary.in.ua/index.phpoption=com_content&view=section&layout=blog&id=8&Itemid=9"
page = urllib.request.urlopen(url)
pageWritten = page.read()
pageReady = pageWritten.decode('utf-8')
xmldata = lxml.html.document_fromstring(pageReady)
text1 = xmldata.xpath('//p[@class="MsoNormal"]//text()')

But nothing works out with another link:

from urllib.parse import urlparse, parse_qs, urlencode

url = 'http://sum.in.ua/?swrd=автор'
parsed_url = urlparse(url)
parameters = parse_qs(parsed_url.query)
url = parsed_url._replace(query=urlencode(parameters)).geturl()
page = urllib.request.urlopen(url)

pageWritten = page.read()
pageReady = pageWritten.decode('utf-8')
xmldata = lxml.html.document_fromstring(pageReady)
text1 = xmldata.xpath('//div[@itemprop="articleBody"]')

It gets me an empty list. Xpath is fine, while I double-checked it with Xpath Helper in Chrome.

Any ideas?

There is no such `
` in the HTML loaded from the URL. The *browser* can be served different HTML based on headers, or JavaScript could have altered the object tree. — Martijn Pieters, Apr 04 '15 at 13:30
The URL redirects to http://sum.in.ua/s/[.avtor.] which contains no results. — Martijn Pieters, Apr 04 '15 at 13:31
the Xpath from browser perspective is different from another xml and html parsers , for more detail read this question http://stackoverflow.com/questions/29220031/convert-the-xpath-gotten-from-browser-to-usable-xpath-for-scrapy — Mazdak, Apr 04 '15 at 13:33
@MartijnPieters Thanks second time, but is there the other way to deal with either Ukrainian symbols in URL or XPath itself? — Khrystyna, Apr 04 '15 at 13:52
@KhrystynaSkopyk: you misunderstand me; this is nothing to do with encodings. The returned page simply doesn't contain the element you are looking for; the XPath expression is otherwise fine. — Martijn Pieters, Apr 04 '15 at 13:54
@MartijnPieters Here http://sum.in.ua/s/avtor and sum.in.ua/s/[.avtor.] The former leads to what is needed, the latter - to nothing. I believe, the second link was created after You helped me in encoding/decoding it. — Khrystyna, Apr 04 '15 at 13:58
@KhrystynaSkopyk: ah! there is actually a small bug in my previous answer. — Martijn Pieters, Apr 04 '15 at 14:01
@KhrystynaSkopyk: corrected my [other answer](https://stackoverflow.com/a/29436298) to add `doseq=True`; the `parse_qs()` result always uses lists for the values (sequences) and you need to explicitly tell `urlencode()` to support that style. Mea Culpa! — Martijn Pieters, Apr 04 '15 at 14:03
@KhrystynaSkopyk: That bug was the underlying issue for your question here. I'll dupe this to your previous question to indicate this; feel free to delete it if you feel it is not otherwise useful. — Martijn Pieters, Apr 04 '15 at 14:04
@MartijnPieters Saw your correction) Thank you so much, but I feel bad to say that it still gives an empty list( — Khrystyna, Apr 04 '15 at 14:15
@KhrystynaSkopyk: https://gist.github.com/mjpieters/0a72eaf8332da9c4166f — Martijn Pieters, Apr 04 '15 at 14:20
@MartijnPieters Unbelievable) IT WORKS:) There are not enough thanks I want to give You) — Khrystyna, Apr 04 '15 at 14:27

Python 3.4.0 -- xpath -- gets me empty list

0 Answers0