I was making a script to download images from comic naver and I'm kind of done with it, however I can't seem to save the images. I successfully grabbed the images via urlib and BeasutifulSoup, now, seems like they've introduced hotlink blocking and I can't seem to save the images on my system via urlib or selenium.
Update: I tried changing the useragent to see if that was causing problems... still the same.
Any fix or solution?
My code right now :
import requests
from bs4 import BeautifulSoup
import re
import urllib
import urllib2
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Chrome/15.0.87"
)
url = "http://comic.naver.com/webtoon/detail.nhn?titleId=654817&no=44&weekday=tue"
driver = webdriver.PhantomJS(desired_capabilities=dcap)
soup = BeautifulSoup(urllib.urlopen(url).read())
scripts = soup.findAll('img', alt='comic content')
for links in scripts:
Imagelinks = links['src']
filename = Imagelinks.split('_')[-1]
print 'Downloading Image : '+filename
driver.get(Imagelinks)
driver.save_screenshot(filename)
driver.close()
Following 'MAI's' reply, I tried what I could with selenium, and got what I wanted. It's solved now. My code :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome()
url = "http://comic.naver.com/webtoon/detail.nhn?titleId=654817&no=44&weekday=tue"
driver.get(url)
elem = driver.find_elements_by_xpath("//div[@class='wt_viewer']//img[@alt='comic content']")
for links in elem:
print links.get_attribute('src')
driver.quit()
but, when I try to taek screenshots of this, it shows that the "element is not attached to the page". Now, how am I supposed to solve that :/