There is a python Library - Newspaper3k, which makes life easier to get content of web pages. [newspaper][1]
for title retrieval:
import newspaper
a = Article(url)
print(a.title)
for content retrieval:
url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
article = Article(url)
article.text
I want get info about web pages (sometimes title, sometimes actual content)there is my code to fetch content/text of web pages:
from newspaper import Article
import nltk
nltk.download('punkt')
fil=open("laborURLsml2.csv","r")
# 3, below read every line in fil
Lines = fil.readlines()
for line in Lines:
print(line)
article = Article(line)
article.download()
article.html
article.parse()
print("[[[[[")
print(article.text)
print("]]]]]")
The content of "laborURLsml2.csv" file is: [laborURLsml2.csv][2]
My issue is: my code reads first URL and prints content but failed to read 2 URL on-wards