import requests
from lxml import html
page = requests.get('http://www.cnn.com')
html_content = html.fromstring(page.content)
for i in html_content.iterchildren():
print i
news_stories = html_content.xpath('//h2[@data-analytics]/a/span/text()')
news_links = html_content.xpath('//h2[@data-analytics]/a/@href')
I am trying to run this code to understand how web scraping in python works.
I want to scrap top news stories and their links from CNN.
When i run this in Python Shell, the output for news_stories and news_links i get is:
[]
My question is where am i going wrong with this and is there a better way to achieve what i am trying to than this one?