Questions tagged [python-newspaper]

Newspaper is a Python library which delivers Instapaper style article extraction.

Newspaper is a Python library which delivers Instapaper style article extraction. Newspaper is inspired by requests and powered by lxml.

Useful links

111 questions
10
votes
3 answers

How to use Newspaper3k library without downloading articles?

Suppose I have local copies of news articles. How can I run newspaper on those articles? According to the documentation, the normal use of the newspaper library looks something like this: from newspaper import Article url =…
Flux
  • 9,805
  • 5
  • 46
  • 92
8
votes
1 answer

How to fix Newspaper3k 403 Client Error for certain URL's?

I am trying to get a list of articles using a combo of the googlesearch and newspaper3k python packages. When using article.parse, I end up getting an error: newspaper.article.ArticleException: Article download() failed with 403 Client Error:…
totalmayhem
  • 83
  • 1
  • 5
8
votes
2 answers

How to parse only a specific category of a website using the newspaper library?

I use Python3 and the newspaper library. It is said that this library can create a Source object that is an abstraction of a news website. But what if I need only the abstraction of a certain category. For example, when I use this url I want to get…
6
votes
1 answer

Publishing date in newspaper library always returning None

I've been using newspaper library lately. The only issue I am finding is when I do article.publish_date I am always getting None. class NewsArticle: def __init__(self,url): self.article = Article(url) self.article.download() …
Eigenvalue
  • 1,093
  • 1
  • 14
  • 35
5
votes
2 answers

"No module named tldextract"

I tried the following code in python: from newspaper import Article #A new article from BBC url = "http://www.bbc.com/news/magazine-26935867" #For different language newspaper refer above table BBC_article = Article(url, language="en") # en for…
learingstuff
  • 53
  • 1
  • 1
  • 5
4
votes
3 answers

Extract image using Newspaper from HTML

I can't download articles like one usually does to instantiate the Article object, like below: from newspaper import Article url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/' article =…
notverygood
  • 297
  • 2
  • 13
4
votes
2 answers

How to access cached articles in newspaper3k

Newspaper is a fantastic library that allows scraping web data however I am a little confused with article caching. It caches the article to speed up operations but how do I access those articles? I have something like this. Now when I run this…
Naman
  • 179
  • 2
  • 13
4
votes
2 answers

ImportError: No module named newspaper

I am trying to build a python program that will display various headlines from certain news sites. I used pip to install the module newspaper, but when I run the program, I get the error: ImportError: No module named newspaper Any ideas on how to…
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
4
votes
1 answer

Python Newspaper with web archive (wayback machine)

I'm trying to use the Python library newspaper with the archives from the Wayback Machine, which stores old versions of websites that were archived. Theoretically, old news articles could be queried and downloaded from these archives. For instance,…
4
votes
4 answers

Python: Newspaper Module - Any way to pool getting articles straight from URLs?

I'm using the Newspaper module for python found here. In the tutorials, it describes how you can pool the building of different newspapers s.t. it generates them at the same time. (see the "Multi-threading article downloads" in the link above) Is…
Afflatus
  • 2,302
  • 5
  • 25
  • 40
4
votes
2 answers

How to use Python newspaper library?

I'm trying to make web parser and saved it. I had found the newspaper library. I'm using Eclipse. But I couldn't get good result. Please help me. import newspaper cnn_paper = newspaper.build('http://cnn.com') for article in cnn_paper.articles: …
Steve
  • 43
  • 1
  • 4
3
votes
1 answer

Scraping news articles into one single list with NewsPaper library in Python?

Dear Stackoverflow community! I would like to scrape news articles from the CNN RSS feed and get the link for each scraped article. This workes very well with the Python NewsPaper library, but unfortunately I am unable to get the output in a usable…
Mercury
  • 37
  • 1
  • 5
3
votes
2 answers

Newspaper library

As an absolute newbie on the topic of using python, I stumbled over a few difficulties using the newspaper library extension. My goal is to use the newspaper extension on a regular basis to download all new articles of a German news website called…
3
votes
1 answer

python newspaper module - get all the images from an article

By using newspaper module of python , I can get the top image from an article in the following way: from newspaper import Article first_article = Article(url="http://www.lemonde.fr/...",…
Istiaque Ahmed
  • 6,072
  • 24
  • 75
  • 141
3
votes
3 answers

Handling Article Exceptions in Newspaper

I have a bit of code that uses newspaper to go take a look at various media outlets and download articles from them. This has been working fine for a long time but has recently started acting up. I can see what the problem is but as I'm new to…
bengen343
  • 139
  • 1
  • 2
  • 8
1
2 3 4 5 6 7 8