2

I would like to download a large archive file with python and save it, but urllib is not working for me. This is my code:

    import urllib
    urllib.request("http://www.petercollingridge.co.uk/sites/files/peter/particle_tutorial_7.txt")

Note that the link I used in this example is not a large archive. I am only using it as an example. It links directly to a .txt file, so it should work. I am getting this error:

    Traceback (most recent call last):
    File "<pyshell#11>", line 1, in <module>
    urllib.request("http://www.petercollingridge.co.uk/sites/files/peter/particle_tutorial_7.txt")
    AttributeError: 'module' object has no attribute 'request'

It seems to me like urllib is somehow broken and missing the "request" method. I am using Python 3.3. Should I be using another module or is it actually a Python problem?

user2218101
  • 135
  • 2
  • 5
  • 10
  • possible duplicate of [Download file from web in Python 3](http://stackoverflow.com/questions/7243750/download-file-from-web-in-python-3) – kenorb Jul 27 '15 at 22:40

3 Answers3

14

No, it is not broken. The urllib.request documentation is pretty clear on how this works:

import urllib.request
req = urllib.request.urlopen('http://www.google.com')
data = req.read()

Edit: If you need to write the file directly to disk rather than process the data, use urlretrieve.

urllib.request.urlretrieve('http://example.com/big.zip', 'file/on/disk.zip')
ChrisP
  • 5,812
  • 1
  • 33
  • 36
  • 2
    Note: *"I would like to download a large archive file with python and save it"*. `req.read()` is not appropriate if you want to download a large file that might not fit in memory. You could use [`urlretrieve()` instead](http://stackoverflow.com/a/20716205/4279) – jfs Dec 21 '13 at 06:28
  • 6
    `urlretrieve` *might become deprecated at some point in the future* according to the manual. Is there a future-safe way of saving directly to a file? – steffen Dec 03 '15 at 02:02
4

To download an url into a file, you could use urlretrieve() function:

from urllib.request import urlretrieve
url = "http://www.petercollingridge.co.uk/sites/files/peter/particle_tutorial_7.txt"
urlretrieve(url, "result.txt")
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • `urlretrieve` is a legacy implementation and *might become deprecated* in future versions: https://docs.python.org/3/library/urllib.request.html#legacy-interface – ccpizza Nov 08 '20 at 13:40
  • @ccpizza: I've been seeing the warning for almost 10 years and the wording is the same: "might become deprecated" that is not the same thing as "the function is deprecated". `urlretrieve()` is just a convenience function, a wrapper around `urlopen()`. If it works for you, no reason to reinvent the wheel. – jfs Nov 09 '20 at 16:18
  • I've found that it doesn't work when specific headers need to be set since the api doesn't allow setting custom headers, e.g. `user-agent`. – ccpizza Nov 10 '20 at 14:48
1

The urllib2 module has been split across several modules in Python 3.0 named urllib.request and urllib.error. The 2to3 tool will automatically adapt imports when converting your sources to 3

from urllib.request import urlopen

data = urlopen(r"http://www.petercollingridge.co.uk/sites/files/peter/particle_tutorial_7.txt")

print(data)
Siva Cn
  • 929
  • 4
  • 10