0

I am trying to download all the images from the website but been unable to do so. How I can download all the images from a specific section of a website and save it to my directory?

The below code exports all the image and saves the image link to a csv file, but I also want the image to save it in my directory also.

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup

my_url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'

req = Request(my_url, headers={'User-Agent': 'Mozilla/5.0'})


webpage = urlopen(req).read()
page_soup = soup(webpage, "html.parser")


filename = "abc.csv"
f = open(filename, "w")


headers = "imagelink\n" 
f.write(headers)


snackcrisps = page_soup.findAll("div",{"class":"divCategories divShops-newegg"})
crispitem = snackcrisps[0]


img = crispitem.findAll("div",{"class":"product_image_div productSmall_image_div_lit"})
img1 = img[0]


for img1 in img:

    img2 = img1.findAll('img')
    imageLink = img2[0].get('src')


    print("imageLink: " + imageLink)

    f.write(imageLink + "\n")

f.close()

How can I save the images in my local directory? Help needed!!

Many Thanks

m13op22
  • 2,168
  • 2
  • 16
  • 35
Sushil S
  • 1
  • 2
  • Possible duplicate of [How to download images from BeautifulSoup?](https://stackoverflow.com/questions/37158246/how-to-download-images-from-beautifulsoup) – m13op22 Sep 03 '19 at 21:20
  • I am new to this don't know how to fix it. Can someone fix this for me? Thnks – Sushil S Sep 03 '19 at 21:27

1 Answers1

0

I used the response to this post to formulate my answer.

First you need to build the full URL for the image you want. This could be as simple as appending "https:" to the beginning of the image link, or not changing the value at all. You'll have to investigate (review this post) how to adjust the URLs you find based on whether or not they are relative or absolute.

You'll want to use the requests module to make the request for the image.

import requests
import shutil

for img1 in img:

    img2 = img1.findAll('img')
    imageLink = img2[0].get('src')
    if not "https:" in imageLink:
        imageLink = "https:" + imageLink

    r = requests.get(imageLink, stream=True)
    if r.response = 200:
        with open("my_file.jpg", 'wb') as f:
            r.raw.decode_content = True
            shutil.copyfileobj(r.raw, f)
fendall
  • 524
  • 2
  • 8