7

Looking for a clean Python Wget solution of downloading multiple files at once.

The url will be always the same :

https://example.com/

So far I can do this :

import wget

print('Beginning file download with wget module')
url = 'https://example.com/new_folder/1.jpg'
wget.download(url)

But i need to download also the -2.jpg, -3.jpg , -4.jpg, -5.jpg and rename the NWZV1WB to something like NEWCODE-1.jpg, NEWCODE-2.jpg...


Also I need to download all content(22).jpg files inside a folder and rename the folder localy to something like NEWCODE, but keep the original name of the files

Here the url also is always the same :

import wget

print('Beginning file download with wget module')
url = 'https://example.com/big/1.jpg' #there's 18 jpg inside
wget.download(url)

What would be best, wget (can't find to many articles about) or requests ? Any help is appreciated.

Andie31
  • 305
  • 3
  • 13

1 Answers1

3

For example:

import wget
import os
import multiprocessing

def run_process(url, output_path):
    wget.download(url, out=output_path)
    # TODO: you can write your rename logic at here using os.rename


if __name__ == '__main__':
    cpus = multiprocessing.cpu_count()
    max_pool_size = 4
    pool = multiprocessing.Pool(cpus if cpus < max_pool_size else max_pool_size)
    base_dir = os.path.dirname(os.path.abspath(__file__))
    target = "NEWCODE"
    prefix_list = ["NWZV1WB", "AWU3JAD", "NW96MRD"]
    download_list = []
    name_list = list(range(1, 23))
    name_list.extend(["zoom_side", "zoom_sole", "zoom_side-thumb"])
    for prefix in prefix_list:
        path = os.path.join(base_dir, prefix)
        if not os.path.exists(path):
            os.mkdir(path)
        if not os.path.isdir(path):
            exit()
        for name in name_list:
            download_list.append(['https://img2.tennis-warehouse.com/360/{p}/{n}.jpg'.format(n=name, p=prefix), path])

    for url, path in download_list: # change here to download other files
        print('Beginning file download with wget module {n}'.format(n=url))
        pool.apply_async(run_process, args=(url, path, ))
    # add your code here to download other files
    pool.close()
    pool.join()
    print("finish")
jps
  • 20,041
  • 15
  • 75
  • 79
Henry
  • 176
  • 6
  • Nice one ! Quick question, why the files are not going inside the NEWCODE folder ? Also, there's 4 other files inside that `/360/NWZV1WB/` folder...that had always the names zoom_side.jpg, zoom_sole.jpg, zoom_side-thumb.jpg, zoom_sole-thumb.jpg that needs to be downloaded :( And the last question, how am I dealing with multiple producst ? Say I want to download NWZV1WB and AWU3JAD and NW96MRD ? Appreciated Henry ! – Andie31 Aug 24 '18 at 09:39
  • 1
    I read the wget python code at [here](https://bitbucket.org/techtonik/python-wget/src/3001fd1b30aca7e5fe162c9193ccbb951dabb4ea/wget.py?at=default&fileviewer=file-view-default) and I seem to forget to send the output_file. This is the first question. – Henry Aug 24 '18 at 09:44
  • 1
    As for the second question, this is just an example, you can simply change the logic of `for`, or just make a `list` which contains url you want and traversing it. – Henry Aug 24 '18 at 09:48
  • 1
    Just giving different url to `pool.apply_async(run_process, args=(url, path, ))` before `pool.close()` – Henry Aug 24 '18 at 09:54
  • First question working perfect, and seems to be a bit faster ! Nice ! Second and third question...I'm lost...gotta be honest...would appreciated if you can update the code :( I'm a newbie – Andie31 Aug 24 '18 at 10:05
  • Almost there I guess.... just got this error : `AttributeError: 'range' object has no attribute 'extend' ` on line 25 :( – Andie31 Aug 24 '18 at 10:19
  • You're the man ! Can we have each prefix in a separate folder ? Instead of putting them all in that NEWCODE folder ? So for example NWZV1WB will go into a folder called NWZV1WB and so on. – Andie31 Aug 24 '18 at 10:26
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/178675/discussion-between-andie31-and-henry-yuan). – Andie31 Aug 24 '18 at 10:30
  • thanks for the code its working :) @Henryyuan – Ramesh Jul 16 '21 at 16:19