Python xpath query not returning text value

Question

I am trying to scrape data from this page using the lxml module in Python. I want to get the text in the first paragraph, but the following code is returning null value

from lxml import html
import requests

page = requests.get('http://www.thehindu.com/todays-paper/with-afspa-india-has-failed-statute-amnesty/article7376286.ece')
tree = html.fromstring(page.text)
data = tree.xpath('//*[@id="left-column"]/div[6]/p[1]/text()')
print data

Well, at least when I fetch the page, `[@id="left-column"]` is empty. — dhke, Jul 09 '15 at 15:34
@dhke- when I inspect the element for the page, and copy the xpath corresponding to that parapraph, this is the path that I get. Am I doing something wrong here? — Saharsh Agarwal, Jul 09 '15 at 15:43
Actually, when I try with `//div[class='articleLead']` or `//xh:div[class='articleLead']` (with `namespaces={'xh': 'http://www.w3.org/1999/xhtml'}`), the result is still empty even though I can clearly see that element ... — dhke, Jul 09 '15 at 15:55
even if I replace that line by `data = tree.xpath('//*[@class="body"]/text()')` I'm not getting any value — Saharsh Agarwal, Jul 09 '15 at 15:57
Try to download (outside of the browser) the file separately and check contents. Because the raw data does not match what you see in the browser. This seems either (nasty) bug or they have some kind of scraping protection in place that edits the DOM after page load. — dhke, Jul 09 '15 at 16:03

score 0 · Answer 1 · answered Jul 09 '15 at 16:26

0

Try //div[class='article-text']/p/text()

answered Jul 09 '15 at 16:26

Brent D

898
5
16

score 0 · Answer 2 · edited Nov 03 '15 at 11:40

0

you can use xpath as follow :

div[@class='article-text']/p[1]/text()

edited Nov 03 '15 at 11:40

Soner Gönül

97,193
102
206
364

answered Nov 03 '15 at 11:15

Piyush

511
4
13

Python xpath query not returning text value

2 Answers2

Linked