0

I have a piece of Python code that helps me with scraping some images from a website every morning - for a daily project I am responsible for. It all works fine and I get JPGs and PNGs with no issues. The problem is that animated GIFs most of the time get saved/downloaded as a static GIF. Sometimes it does save as animated but rarely.

Im not really familiar with BeautifulSoup, so I'm not sure if I'm doing something wrong, or there is a limitation in the way BeautifulSoup handles animated GIFs.

Im using the kickstarter url just for testing purposes...

import os
import sys
import requests
import urllib
import urllib.request
from bs4 import BeautifulSoup
from csv import writer

baseUrl = requests.get('https://www.kickstarter.com/projects/peak-design/travel-tripod-by-peak-design')
soup = BeautifulSoup(baseUrl.text, 'html.parser')

allImgs = soup.findAll('img')

imgCounter = 1

for img in allImgs:
    newImg = img.get('src')

    # CHECK EXTENSION
    if '.jpg' in newImg:
        extension = '.jpg'
    elif '.png' in newImg:
        extension = '.png'
    elif '.gif' in newImg:
        extension = '.gif'

    imgFile = open(str(imgCounter) + extension, 'wb')
    imgFile.write(urllib.request.urlopen(newImg).read())
    imgCounter = imgCounter + 1
    imgFile.close()

Any help or insight on this issue would be most appreciated!!!

-S

Sergio
  • 792
  • 3
  • 10
  • 35
  • 2
    Possible duplicate of [How to download this GIF(dynamic) by Python?](https://stackoverflow.com/questions/39534830/how-to-download-this-gifdynamic-by-python) – David Zemens Jun 18 '19 at 13:31
  • @DavidZemens Ya I read that thread but I'm confused on how to combine what they are doing with what I'm doing. The differences I see are these lines `imgFile.write(urllib.request.urlopen(newImg).read())` vs `f.write(requests.get(uri).content)` Any suggestions? – Sergio Jun 18 '19 at 15:47

1 Answers1

0

Here's what works for me... Basically I need to grab the data-src from any file that is a GIF and not the src as I was doing for ALL images.

Here's the revised code:

import os
import sys
import requests
import urllib
import urllib.request
from bs4 import BeautifulSoup
from csv import writer

baseUrl = requests.get('https://www.kickstarter.com/projects/peak-design/travel-tripod-by-peak-design')
soup = BeautifulSoup(baseUrl.text, 'html.parser')

allImgs = soup.findAll('img')

imgCounter = 1

for img in allImgs:
    newImg = img.get('data-src')
    if newImg == None:
        newImg = img.get('src')

    #CHECK EXTENSION
    if '.jpg' in newImg:
        extension = '.jpg'
    elif '.png' in newImg:
        extension = '.png'
    elif '.gif' in newImg:
        extension = '.gif'

    imgFile = open(str(imgCounter) + extension, 'wb')
    imgFile.write(urllib.request.urlopen(newImg).read())
    imgCounter = imgCounter + 1
    imgFile.close()
Sergio
  • 792
  • 3
  • 10
  • 35