2

[![enter image description here][1]][1]I want to download the images from a Wikipedia page so I write this program, the txt file it's saving with all of the links but I don't know how to continue the program to download files. Can someone help me?

from urllib.request import urlopen
from bs4 import BeautifulSoup
from requests import get 
import urllib.request
import wikipedia
import requests
import re

title = input("Title: ")
link = (wikipedia.page(title).url)
html = urlopen(link)
bs = BeautifulSoup(html, 'html.parser')
images = bs.find_all('img', {'src':re.compile('.jpg')})
f= open("cache.txt","w+")
for image in images: 
    url = ('https:' + image['src']+'\n')
    f.write(url)

3 Answers3

0

You can use the wget module to download file.

pip install wget

To download a file using wget

wget.download(url)

You have to go through each line on your txt file and download the file using wget.

python code

import wget
import csv


with open("cache.txt","r") as f:
    line = csv.reader(f)
    for i in line:
        wget.download(i[0])
shaongit
  • 56
  • 4
0

enter image description hereenter image description hereIfound this maybe help... It downloads a image but the rest urllib.error.HTTPError: HTTP Error 404: Not Found

import wget
import csv
with open('cache.csv', newline='') as csvfile:
     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
     for row in spamreader:
         wget.download(', '.join(row))
0

I solve it this is the code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
from requests import get 
import urllib.request
import wikipedia
import requests
import re

title = input("Title: ")
link = (wikipedia.page(title).url)
html = urlopen(link)
bs = BeautifulSoup(html, 'html.parser')
images = bs.find_all('img', {'src':re.compile('.jpg')})
f= open("cache.txt","w+")
for image in images: 
    url = ('https:' + image['src']+'\n')
    f.write(url)

with open('cache.txt') as f:
   for line in f:
      url = line
      path = 'image'+url.split('/', -1)[-1]
      urllib.request.urlretrieve(url, path.rstrip('\n'))