Extract .jpg from HTML source code with Python

Question

I set up this code to extract the links from the following website. The problem is that it breaks into register 19 and doesn't continue with the listing. You can help me.

import urllib.request    
import os 

tematica = 'fun'

url = "https://www.shutterstock.com/es/search/" + tematica
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
data_content = response.read()

Html_file= open("html_file.html","wb")
Html_file.write(data_content)
Html_file.close()

html=codecs.open("html_file.html", 'r', 'utf-8').read()

soup = BeautifulSoup(html)

for i,img_element in enumerate(soup.findAll('img', None)):
    try:
        img_src = img_element['src']
        print(i,img_src)
    except:
        pass

I examined the HTML page and found exactly 20 `img` tags having a `src` attribute, so the behaviour of your program should be correct. What else would you expect? — leqo, Feb 25 '20 at 16:12
Now that you say it, you are right, it is only 20. Seeing the source code, I need to extract all the links that follow "thumbnail", there are all the .jpg. — Raymont, Feb 25 '20 at 16:57
@Raymont Is the issue solved, then? Also, using a bare `except` like that is bad design, be careful! (see, for example: https://stackoverflow.com/questions/54948548/what-is-wrong-with-using-a-bare-except) — AMC, Feb 25 '20 at 17:09

Extract .jpg from HTML source code with Python

0 Answers0