0

I set up this code to extract the links from the following website. The problem is that it breaks into register 19 and doesn't continue with the listing. You can help me.

import urllib.request    
import os 

tematica = 'fun'

url = "https://www.shutterstock.com/es/search/" + tematica
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
data_content = response.read()

Html_file= open("html_file.html","wb")
Html_file.write(data_content)
Html_file.close()

html=codecs.open("html_file.html", 'r', 'utf-8').read()

soup = BeautifulSoup(html)

for i,img_element in enumerate(soup.findAll('img', None)):
    try:
        img_src = img_element['src']
        print(i,img_src)
    except:
        pass
Raymont
  • 283
  • 3
  • 16
  • I examined the HTML page and found exactly 20 `img` tags having a `src` attribute, so the behaviour of your program should be correct. What else would you expect? – leqo Feb 25 '20 at 16:12
  • Now that you say it, you are right, it is only 20. Seeing the source code, I need to extract all the links that follow "thumbnail", there are all the .jpg. – Raymont Feb 25 '20 at 16:57
  • @Raymont Is the issue solved, then? Also, using a bare `except` like that is bad design, be careful! (see, for example: https://stackoverflow.com/questions/54948548/what-is-wrong-with-using-a-bare-except) – AMC Feb 25 '20 at 17:09

0 Answers0