It's fairly simple stuff here...So i'm currently experimenting with python, and I have very little experience... I wanted to create an image scraper what goes to page downloads the image clicks link (next page) and downloads other image and so on (as source I use website similar to 9gag). Right now my script can just print the image url and next link url, so I cant figure out how to make my bot click on link and download next image and do it infinitely (until condition met/stopped etc)...
PS im using beautifulsoup4 (i think LOL)
Thanks in advance, Zil
Here what the script look like now, i was kinda combining couple scripts into one, and so the script looks very unclean...
import requests
from bs4 import BeautifulSoup
import urllib
def trade_spider(max_pages):
page = 1
while page <= max_pages:
url2 = 'http://linksmiau.net/linksmi_paveiksliukai/rimtas_rudeninis_ispejimas_merginoms/1819/'
url = url2
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for img in soup.findAll('img', {'class': 'img'}):
temp = img.get('src')
if temp[:1]=="/":
image = "http://linksmiau.net" + temp
else:
image = temp
print(image)
for lnk in soup.findAll('div', {'id': 'arrow_right'}):
nextlink = lnk.get('onclick')
link = nextlink.replace("window.location = '", "")
lastlink = "http://linksmiau.net" + link
page += 1
print(lastlink)
url2 == lastlink
trade_spider(3)