0

I try this to load a zip file from a url.

import requests

resp = requests.get('https://nlp.stanford.edu/data/glove.6B.zip')

I now the file is colossal, and I don't know in between if everything is going well or not.

(1) Is there a way to make the loading more verbose ?

(2) How do I know where data are loaded, and is there a relative path for it, which I can use for implementing the rest of my script ?

(3) How to nicely unzip ?

(4) How to either choose/set a file name or get the file name for the downloaded file ?

kiriloff
  • 25,609
  • 37
  • 148
  • 229

1 Answers1

1

Is there a way to make the loading more verbose ?

If you want to download file to disk and be aware how many bytes were already downloaded you might use urrlib.request.urlretrieve from built-in module urllib.request. It does accept optional reporthook. This should be function which accept 3 arguments, it will be called at begin and end of each chunk with:

  • number of chunk
  • size of chunk
  • total size or 1 if unknown

Simple example which prints to stdout progress as fraction

from urllib.request import urlretrieve
def report(num, size, total):
    print(num*size, '/', total)
urlretrieve("http://www.example.com","index.html",reporthook=report)

This does download www.example.com to current working directory as index.html reporting progress by printing. Note that fraction might be > 1 and should be treated as estimate.

EDIT: After download of zip file end, if you want to just unpack whole archive you might use shutil.unpack_archive from shutil built-in module. If more fine grained control is desired you might use zipfile built-in module, in PyMOTW3 entry for zipfile you might find examples like listing files inside ZIP archive, reading selected file from ZIP archive, reading metadata of file inside ZIP archive.

Daweo
  • 31,313
  • 3
  • 12
  • 25