3

By using newspaper module of python , I can get the top image from an article in the following way:

from newspaper import Article
first_article = Article(url="http://www.lemonde.fr/...", language='fr')
first_article.download()
first_article.parse()
print(first_article.top_image)

But I need to get all the images in the article. Their github documentation says : 'All image extraction from html' is possible. But I can't just figure that out. And i do no want to manually download and save the html files in hard drive and then feed the module with the files and get the images.

In what way can I achieve that ?

Community
  • 1
  • 1
Istiaque Ahmed
  • 6,072
  • 24
  • 75
  • 141
  • http://newspaper.readthedocs.io/en/latest/#features what are you see `all image extraction from html` is `features` , they don't have this now – Druta Ruslan Jun 05 '18 at 19:31
  • @zimdero, what do you mean ? A feature is what exists . Top image extraction is also a feature and described in the doc – Istiaque Ahmed Jun 05 '18 at 19:32
  • i mean that it will be in the future, but now they don't have this function to get all images – Druta Ruslan Jun 05 '18 at 19:34
  • @zimdero, edited my comment – Istiaque Ahmed Jun 05 '18 at 19:34
  • maybe they implement the `top_image` functional but `all_image` is not complete, i don't know, i also search the response of the problem and i don't find any thing, you can try @Bear Brown examle of code maybe it will help you – Druta Ruslan Jun 05 '18 at 19:38

1 Answers1

0

You likely solved this already, but you can obtain the image urls with Newspaper by calling article.images.

from newspaper import Article

article = Article(url="http://www.lemonde.fr/", language='fr')
article.download()
article.parse()
top_image = article.top_image
all_images = article.images
for image in all_images:
  print(image)
   
  https://img.lemde.fr/2020/09/22/0/3/4485/2990/220/146/30/0/a79897c_115736902-000-8pt8nc.jpg
  https://img.lemde.fr/2020/09/22/0/0/5315/3543/192/0/75/0/7b90c88_645792534-pns-3418491.jpg
  https://img.lemde.fr/2020/09/09/200/0/1500/999/180/0/95/0/d8099d2_51464-3185927.jpg
  https://img.lemde.fr/2020/09/22/0/4/4248/2832/664/442/60/0/557e6ee_5375150-01-06.jpg
Life is complex
  • 15,374
  • 5
  • 29
  • 58