4

I am currently doing a course on machine learning in UDACITY . In there they have written some code in python 2.7 but as i am currently using python 3.5 , i am getting some error . This is the code

import urllib
url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz"
urllib.urlretrieve(url, filename="../enron_mail_20150507.tgz")
print ("download complete!") 

I tried urllib.request .

  import urllib
  url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz"
  urllib.request(url, filename="../enron_mail_20150507.tgz")
  print ("download complete!")

But still gives me error .

urllib.request(url, filename="../enron_mail_20150507.tgz")
TypeError: 'module' object is not callable

I am using PyCharm as my IDE .

CodeHead
  • 177
  • 1
  • 2
  • 12

3 Answers3

9

You'd use urllib.request.urlretrieve. Note that this function "may become deprecated at some point in the future", so you might be better off using the less likely to be deprecated interface:

# Adapted from the source:
# https://hg.python.org/cpython/file/3.5/Lib/urllib/request.py#l170
with open(filename, 'wb') as out_file:
    with contextlib.closing(urllib.request.urlopen(url)) as fp:
        block_size = 1024 * 8
        while True:
            block = fp.read(block_size)
            if not block:
                break
            out_file.write(block)

For small enough files, you could just read and write the whole thing and drop the loop entirely.

mgilson
  • 300,191
  • 65
  • 633
  • 696
4

You can use shutil.copyfileobj() to magically copy from the url bytestream to the file.

import urllib.request
import shutil

url = "http://www.somewebsite.com/something.pdf"
output_file = "save_this_name.pdf"
with urllib.request.urlopen(url) as response, open(output_file, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

Source: https://stackoverflow.com/a/48691447/1174102

Socowi
  • 25,550
  • 3
  • 32
  • 54
Michael Altfield
  • 2,083
  • 23
  • 39
2

I know this question has long been answered but I'll contribute for any future viewer.

The proposed solution is good but the main issue if that it can generate empty files if you are using invalid urls.

As a workaround to this problem here is how I adapted the code:

def getfile(url,filename,timeout=45):
    with contextlib.closing(urlopen(url,timeout=timeout)) as fp:
        block_size = 1024 * 8
        block = fp.read(block_size)
        if block:
            with open(filename,'wb') as out_file:
                out_file.write(block)
                while True:
                    block = fp.read(block_size)
                    if not block:
                        break
                    out_file.write(block)
        else:
            raise Exception ('nonexisting file or connection error')

I hope this help.

Al rl
  • 314
  • 1
  • 11