I'm really new to python but i really want to make a webscraping application which would look up a popular picture sharing website's gallery and download the first 15 latest thing people uploaded. I got as far as getting the urls directed to the jpgs and saving them into a txt file. Then i open the file and try to read line by line then download the jpgs with requests and save them into different files using uuid to generate their random filenames. My final goal is to write something that will automatically categorize pictures uploaded by random people like cats,dogs,furniture etc.
I've tried researching the topic but i'm really confused. Would love some feedback.
import requests
from bs4 import BeautifulSoup
import re
link = 'link'
ip = '176.88.217.170:8080'
proxies = {
'http': ip,
'https': ip,
}
r = requests.get(link, proxies=proxies)
import uuid
unique_filename = str(uuid.uuid4())
print(unique_filename)
#r = requests.get(link)
c = r.content
bs = BeautifulSoup(c, 'html.parser')
images = bs.find_all('img', {'src':re.compile('_tn.jpg')})
with open('data.txt', 'w') as f:
for image in images:
f.write(image['src']+'\n')
print('done')
for mentes in images:
with open('data.txt', 'r+') as read:
cnt = 0
for line in read:
line = line.strip()
line = read.readline()
cnt += 1
print(cnt)
print(line)
with open(unique_filename +'.jpg' , 'wb') as kep:
kep.write(requests.get(line , proxies=proxies).content)
print(line)
kep.close()
print('saved')
I want to save the scraped images with a randomly generated name as jpgs for future use.
I'm mainly asking for a direction or a suggestion for what should i look up more because my logic and skills are lacking.