Basic http file downloading and saving to disk in python?

Question

^{I've been going through the Q&A on this site, for an answer to my question. However, I'm a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.}

Could someone please explain a simple solution to 'Downloading a file through http' and 'Saving it to disk, in Windows', to me?

I'm not sure how to use shutil and os modules, either.

The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!

Here's a partial solution, that I wrote from various answers combined:

import requests
import os
import shutil

global dump

def download_file():
    global dump
    url = "http://randomsite.com/file.gz"
    file = requests.get(url, stream=True)
    dump = file.raw

def save_file():
    global dump
    location = os.path.abspath("D:\folder\file.gz")
    with open("file.gz", 'wb') as location:
        shutil.copyfileobj(dump, location)
    del dump

Could someone point out errors (beginner level) and explain any easier methods to do this?

note if you are downloading from pycharm note that who knows where the "current folder is" — Charlie Parker, Aug 10 '21 at 17:45

score 229 · Accepted Answer · edited May 23 '17 at 11:47

229

A clean way to download a file is:

import urllib

testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.

This example uses the urllib library, and it will directly retrieve the file form a source.

edited May 23 '17 at 11:47

Community

1
1

answered Oct 26 '13 at 04:59

Blue Ice

7,888
6
32
52

3

Ok, thanks! But is there a way to get it working through requests? – arvindch Oct 26 '13 at 15:52
5

Any possibility to save in /myfolder/file.gz ? – John Lapoya Mar 16 '14 at 17:57
18

No better possibility than trying it yourself, maybe? :) I could successfully do `testfile.retrieve("http://example.com/example.rpm", "/tmp/test.rpm")`. – Dharmit Sep 26 '14 at 05:47
@Dharmit Is there a way to close that file? I mean, I want to download a file, do something to it, then delete it. However, when I try to delete it with os.remove(path/file) I get error: no such file or directory – Arash Saidi Oct 09 '14 at 14:57
@ArashSaidi A little bit late to the party, but when you open it you could try using (in this case) testfile.close() to close te file before deleting, but when i tested i didn't get the same error – Nov 17 '16 at 06:56
31

This is deprecated since Python 3.3, and the urllib.request.urlretrieve solution (see answer below) is the 'modern' way – MichielB Feb 15 '17 at 09:14
1

What is the best way to add a username and password to this code? tks – Estefy Sep 17 '17 at 22:06
am looking for the same on how to add username and password ?how to authenticate? – carte blanche Oct 03 '18 at 22:30
get this error: `AttributeError: module 'urllib' has no attribute 'URLopener'` – Charlie Parker Aug 10 '21 at 17:33
how do indicate which folder/path to save the contents of the url? – Charlie Parker Aug 10 '21 at 17:35
note if you are downloading from pycharm note that who knows where the "current folder is" – Charlie Parker Aug 10 '21 at 17:45

score 185 · Answer 2 · answered May 14 '19 at 11:10

185

For Python3+ URLopener is deprecated. And when used you will get error as below:

url_opener = urllib.URLopener() AttributeError: module 'urllib' has no attribute 'URLopener'

So, try:

import urllib.request 
urllib.request.urlretrieve(url, filename)

answered May 14 '19 at 11:10

Om Sao

7,064
2
47
61

10

Weird... Why nobody votes for this answer when Python 2 became deprecated and only this solution should work properly... – wowkin2 Feb 06 '20 at 21:33
3

Agreed! I was pulling my hair over the earlier solutions. Wish I could upvote 200 times! – Yechiel K Feb 21 '20 at 01:19
1

how do indicate which folder/path to save the contents of the url? – Charlie Parker Aug 10 '21 at 17:35
1

note if you are downloading from pycharm note that who knows where the "current folder is" – Charlie Parker Aug 10 '21 at 17:46
You deserve more upvotes. I don't understand why solutions for Python 2 are still accepted. – Fernando Ortega Nov 12 '22 at 17:09

score 120 · Answer 3 · edited May 23 '17 at 12:02

120

As mentioned here:

import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")

EDIT: If you still want to use requests, take a look at this question or this one.

edited May 23 '17 at 12:02

Community

1
1

answered Oct 26 '13 at 05:00

dparpyani

2,473
2
14
16

2

urllib will work, however, many people seem to recommend the use of requests over urllib. Why's that? – arvindch Oct 26 '13 at 15:51
2

`requests` is extremely helpful compared to `urllib` when working with a REST API. Unless, you are looking to do a lot more, this should be good. – dparpyani Oct 26 '13 at 16:41
Ok, now I've read the links you've provided for requests usage. I'm confused about how to declare the file path, for saving the download. How do I use os and shutil for this? – arvindch Oct 26 '13 at 17:30
83

For Python3: `import urllib.request` `urllib.request.urlretrieve(url, filename)` – Flash May 30 '14 at 14:04
1

I am not able to extract the http status code with this if the download fails – Aashish Thite Sep 29 '14 at 23:31
urlretrieve is part of the legacy interface and the Python 3 docs state that it may be deprecated in the future. – Qudit Jan 13 '18 at 00:58
doesn't work for me: `AttributeError: module 'urllib' has no attribute 'urlretrieve` – Charlie Parker Aug 10 '21 at 17:33
how do indicate which folder/path to save the contents of the url? – Charlie Parker Aug 10 '21 at 17:35
note if you are downloading from pycharm note that who knows where the "current folder is" – Charlie Parker Aug 10 '21 at 17:45

score 42 · Answer 4 · answered Jul 24 '17 at 11:21

Four methods using wget, urllib and request.

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')

testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds

testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds

testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds

testwget - 3489 function calls in 0.020 seconds

How did you get the number of function calls? – Abdelhak Sep 02 '18 at 19:43 — Abdelhak, Sep 02 '18 at 19:43

Ali · Answer 5 · 2014-09-13T21:38:41.440

38

I use wget.

Simple and good library if you want to example?

import wget

file_url = 'http://johndoe.com/download.zip'

file_name = wget.download(file_url)

wget module support python 2 and python 3 versions

edited Sep 13 '14 at 21:38

answered Sep 13 '14 at 21:13

Ali

1,358
3
21
32

score 6 · Answer 6 · answered Nov 22 '17 at 00:50

6

Exotic Windows Solution

import subprocess

subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

answered Nov 22 '17 at 00:50

Max

1,685
16
21

Golden Lion · Answer 7 · 2021-10-26T12:32:21.213

6

import urllib.request
urllib.request.urlretrieve("https://raw.githubusercontent.com/dnishimoto/python-deep-learning/master/list%20iterators%20and%20generators.ipynb", "test.ipynb")

downloads a single raw juypter notebook to file.

edited Oct 26 '21 at 12:32

answered Jan 25 '21 at 18:31

Golden Lion

3,840
2
26
35

DaWe · Answer 8 · 2020-09-21T08:47:37.703

3

For text files, you can use:

import requests

url = 'https://WEBSITE.com'
req = requests.get(url)
path = "C:\\YOUR\\FILE.html"

with open(path, 'wb') as f:
    f.write(req.content)

edited Sep 21 '20 at 08:47

answered Sep 21 '20 at 07:17

DaWe

1,422
16
26

Don't you have to `req.iter_content()`? Or use the `req.raw` file object? See [this](https://stackoverflow.com/questions/13137817/how-to-download-image-using-requests) – Michael Schnerring Sep 21 '20 at 08:19
No, it just works, haven't you tried? @MichaelSchnerring – DaWe Sep 21 '20 at 08:46

Jayme Snyder · Answer 9 · 2018-06-08T15:17:57.937

I started down this path because ESXi's wget is not compiled with SSL and I wanted to download an OVA from a vendor's website directly onto the ESXi host which is on the other side of the world.

I had to disable the firewall(lazy)/enable https out by editing the rules(proper)

created the python script:

import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()

dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
    with open("file.ova", 'wb') as tmp_file:
        shutil.copyfileobj(response, tmp_file)

ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https... so it inspired me to go down this path

score -6 · Answer 10 · edited Jan 25 '17 at 15:30

-6

Another clean way to save the file is this:

import csv
import urllib

urllib.retrieve("your url goes here" , "output.csv")

edited Jan 25 '17 at 15:30

CDspace

2,639
18
30
36

answered Sep 30 '14 at 16:46

Ala

9
3

This should probably be `urllib.urlretrieve` or `urllib.URLopener().retrieve`, unclear which you meant here. – mateor Mar 24 '16 at 21:25
12

Why do you import csv if you're just naming a file? – Azeezah M Jun 29 '16 at 10:28

Basic http file downloading and saving to disk in python?

10 Answers10

Linked

Related