I found this post and wanted to modify the script slightly to download the images to a specific folder. My edited file looks like this:
import re
import requests
from bs4 import BeautifulSoup
import os
site = 'http://pixabay.com'
directory = "pixabay/" #Relative to script location
response = requests.get(site)
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img')
urls = [img['src'] for img in img_tags]
for url in urls:
#print(url)
filename = re.search(r'/([\w_-]+[.](jpg|gif|png))$', url)
with open(os.path.join(directory, filename.group(1)), 'wb') as f:
if 'http' not in url:
url = '{}{}'.format(site, url)
response = requests.get(url)
f.write(response.content)
This seems to work fine for pixabay, but if I try a different site like imgur or heroimages, it doesn't seem to work. If I replace the site declaration with
site = 'http://heroimages.com/portfolio'
nothing is downloaded. The print statement (when uncommented) doesn't print anything, so I'm guessing it's not finding any image tags? I'm not sure.
On the other hand, if I replace site with
site = 'http://imgur.com'
I sometimes get a
AttributeError: 'NoneType' object has no attribute 'group'
or, if the images do download, I can't even open them because I get the following error:
Also worth noting, right now the script requires the folder specified by directory to exist. I plan on changing it in the future so that the script creates the directory, if it does not exist already.