-1

I have made a scraper which is at this moment parsing image links and saving downloaded images into python directory by default. The only thing i wanna do now is choose a folder on the desktop to save those images within but can't. Here is what I'm up to:

import requests
import os.path
import urllib.request
from lxml import html

def Startpoint():
    url = "https://www.aliexpress.com/"
    response = requests.get(url)
    tree = html.fromstring(response.text)
    titles = tree.xpath('//div[@class="item-inner"]')
    for title in titles:
        Pics="https:" + title.xpath('.//span[@class="pic"]//img/@src')[0]
        endpoint(Pics)

def endpoint(images):
    sdir = (r'C:\Users\ar\Desktop\mth')
    testfile = urllib.request.URLopener()
    xx = testfile.retrieve(images, images.split('/')[-1])
    filename=os.path.join(sdir,xx)
    print(filename)

Startpoint()

Upon execution the above code throws an error showing: "join() argument must be str or bytes, not 'tuple'"

SIM
  • 21,997
  • 5
  • 37
  • 109
  • use wget instead to download the URL – Wboy May 01 '17 at 05:12
  • There are plenty of other posts on this site which answer this question [example](http://stackoverflow.com/questions/8286352/how-to-save-an-image-locally-using-python-whose-url-address-i-already-know). Does this or one the others help you to formulate a solution? – Paul Rooney May 01 '17 at 05:28
  • Thanks sir, Paul Rooney for your answer. Following your provided link i could learn to save a single image but in case of more than one image i can't get any idea to name it differently moreover choosing a directory is another concern. I've corrected my above code to be very close to what i want. Thanks. – SIM May 01 '17 at 05:43
  • Possible duplicate of [How to save an image locally using Python whose URL address I already know?](http://stackoverflow.com/questions/8286352/how-to-save-an-image-locally-using-python-whose-url-address-i-already-know) – Peter Wood May 01 '17 at 05:48
  • @SMth80 get the original filename with [**`os.path.basename`**](https://docs.python.org/2/library/os.path.html#os.path.basename) and [**`urlparse`**](https://docs.python.org/2/library/urlparse.html#urlparse.urlparse): `basename(urlparse(images).path)` – Peter Wood May 01 '17 at 06:02
  • `fullpath = os.path.join(directory, filename)` – Peter Wood May 01 '17 at 07:48

4 Answers4

1

you can download images with urllib of python. You can see the official documentation of python here urllib documentation for python 2.7 . If you want to use python 3 then follow this documentation urllib for python 3

0

You could use urllib.request, BytesIO from io and PIL Image. (if you have a direct url to the image)

from PIL import Image
from io import BytesIO
import urllib.request

def download_image(url):
    req = urllib.request.Request(url)
    response = urllib.request.urlopen(req)
    content = response.read()
    img = Image.open(BytesIO(content))
    img.filename = url
    return img
murthy10
  • 156
  • 4
  • 12
0

The images are dynamic now. So, I thought to update this post:

import os
from selenium import webdriver
import urllib.request
from lxml.html import fromstring

url = "https://www.aliexpress.com/"

def get_data(link):

    driver.get(link)
    tree = fromstring(driver.page_source)
    for title in tree.xpath('//li[@class="item"]'):
        pics = "https:" + title.xpath('.//*[contains(@class,"img-wrapper")]//img/@src')[0]
        os.chdir(r"C:\Users\WCS\Desktop\test")
        urllib.request.urlretrieve(pics, pics.split('/')[-1])

if __name__ == '__main__':
    driver = webdriver.Chrome()
    get_data(url)
    driver.quit()
SIM
  • 21,997
  • 5
  • 37
  • 109
-1

This is the code to download the html file from the web

import random
import urllib.request
def download(url):
   name = random.randrange(1, 1000) 
   #this is the random function to give the name to the file 
   full_name = str(name) + ".html" #compatible data type 
   urllib.request.urlretrieve(url,full_name) #main function 
   download("any url")

This is the code for downloading any html file from the internet just you have to provide the link in the function.

As in your case you have told that you have retrieved the images links from the web page So you can change the extension from ".html" to compatible type, but the problem is that the image can be of different extension may be ".jpg" , ".png" etc.

So what you can do is you can match the ending of the link using if else with string matching and then assign the extension in the end.

Here is the example for the illustration

import random
import urllib.request

if(link extension is ".png"): #pseudo code
     def download(url):
        name = random.randrange(1, 1000) 
        #this is the random function to give the name to the file 
        full_name = str(name) + ".png" #compatible extension with .png 
        urllib.request.urlretrieve(url,full_name) #main function 
        download("any url")
else if (link extension is ".jpg"): #pseudo code
     def download(url):
        name = random.randrange(1, 1000) 
        #this is the random function to give the name to the file 
        full_name = str(name) + ".jpg" #compatible extension with .jpg 
        urllib.request.urlretrieve(url,full_name) #main function 
        download("any url")

You can use multiple if else for the various type of the extension. If it helps for your situation have a Thumbs up buddy.

babygame0ver
  • 447
  • 4
  • 16
  • Use `==` not `is` to compare strings. See [Why does comparing strings in Python using either '==' or 'is' sometimes produce a different result?](https://stackoverflow.com/questions/1504717/why-does-comparing-strings-in-python-using-either-or-is-sometimes-produce) (I realise you say pseudo-code) – Peter Wood May 01 '17 at 06:05
  • 1
    There's a lot of repetition just to change the file extension. Put it in a function taking the extension as a parameter. – Peter Wood May 01 '17 at 06:09
  • 1
    You can get the extension from 'Content-Type' or with `url.split('.')[-1]` – t.m.adam May 01 '17 at 06:34
  • @PeterWood , Sir i am also trying to say == not is it is just a pseudo code and also for the case of the repetition same statement i want to convey but it is just a pseudo code. Thank you for adding your response – babygame0ver May 01 '17 at 07:08