Download a file using Python

Question

I have to download a number of files. I tried the following code in python.

import urllib2
ul = urllib2.urlopen('http://dds.cr.usgs.gov/emodis/Africa/historical/TERRA/2012/comp_056/AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip.sum').read()
open("D:/Thesis/test_http_dl", "w").write(ul)

It throws this error:

IOError: [Errno 13] Permission denied: 'D:/Thesis/test_http_dl'

Do you have any idea why is that? Am I doing something wrong?
I have tried different folders and it didn't work. My folders are not read only. the result of print(repr(ul[:60])) is '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<htm'.
urllib.urlretrieve() just creates a 1 kb file in the folder, which obviously is not the downloaded file.

You got the formatting wrong. Hopefully I fixed it, but please check and make sure. — abarnert, Dec 20 '12 at 00:25
Have you tried just attempting to opening the file without writing to it? — TheDude, Dec 20 '12 at 00:25
As another note, it's highly recommended to use the `with` statement to open files to ensure they are closed correctly, even upon exceptions. — Gareth Latty, Dec 20 '12 at 00:26
http://docs.python.org/2/library/urllib.html#urllib.urlretrieve for downloading — Hamoudaq, Dec 20 '12 at 00:27
@EngHamoud: There's nothing wrong with his use of `urllib2.urlopen`. — abarnert, Dec 20 '12 at 00:29
@abarnert i don't think so sir , i've tried his code and got same error then i've tried urlretrieve and works fine and good — Hamoudaq, Dec 20 '12 at 00:37
@f.ashouri check it out i've put it as answer hopefully it will works fine — Hamoudaq, Dec 20 '12 at 00:42
By the way: This isn't actually your problem, but you probably want type `"wb"`, not `"w"`. On Windows, leaving off the `b` means it will try to convert every `0x10` byte into a pair of `0x13`, `0x10` bytes, which will probably make the file not work. — abarnert, Dec 20 '12 at 01:00
"UNSOLVED AFTER 4 ANSWERS" is not very informative: have you tried to write to a different file/directory? Have you tried to manually create the file (with the same name) and some content? What is `print(repr(ul[:60]))`? Does it look like the expected content? If you've tried `urllib.urlretrieve()`; *describe the results in your question*. Etc. — jfs, Dec 20 '12 at 02:05
Does my code working for any of you? or you have the same problem when you are trying it for your folders? — f.ashouri, Dec 20 '12 at 10:34
I see 2 independent issues: 1. the site returns an html instead of a large binary file (To fix, try to set browser-like User-Agent header and/or enable cookies); 2. `open("somefile", "w").write("abc")` fails. You might have posted slightly different code from what you actually use. [@abarnert enumerates some other possible reasons](http://stackoverflow.com/a/13963476/4279). In any case use `with open("somefile", "wb") as file: file.write(b"somebytes")` to save content available in a bytestring or `with open("fn", "wb") as file: shutil.copyfileobj(urlopen(url), file)` if input is a file-like. — jfs, Dec 20 '12 at 11:08

abarnert · Answer 1 · 2012-12-20T18:44:10.343

The error tells you exactly what went wrong. You don't have permission to write to path D:/Thesis/test_http_dl.

There are four possible reasons for that:

You already have a file with that name, which you don't have write access to.
You don't have access to create new files in D:\Thesis.
You don't have write access to the D: drive at all (e.g., because it's a CD-ROM).
Some other process has the file open for exclusive access.

You need to look at the ACLs for D:\Thesis\test_http_dl if it exists, or for D:\Thesis\ otherwise, and see if your user (the one you're running the script as) has write access, and also check whether that path or the D drive itself has the "read-only" flag on, and also check whether any other process has the file open. (I don't know of any built-in tool for that last one, but handle or Process Explorer from sysinternals can do it for you easily.)

Meanwhile, none of the stuff with urllib2 is at all relevant here. You can verify that by just doing this:

open("D:/Thesis/test_http_dl", "w")

You will get the exact same exception.

It's worth knowing how to figure that out the "hard" way, for cases where the exception doesn't tell you exactly what's wrong. You get an exception in a line like this:

open("D:/Thesis/test_http_dl", "w").write(ul)

Something is wrong, and if you don't have enough information to tell what it is, what do you do? Well, first, break it into pieces, so each line has exactly one operation:

f = open("D:/Thesis/test_http_dl", "w")
f.write(ul)

Now you know which one of those two gets an exception.

While you're at it, since the only thing this code depends on is ul, you can create a simpler program to test this:

ul = 'junk'
f = open("D:/Thesis/test_http_dl", "w")
f.write(ul)

Even if that doesn't help you directly, it means you don't need to wait for the download every time through the test loop, and you've got something simpler to post to SO (see SSCCE for more), and this is something you can just type into the interactive interpreter. Instead of trying to guess what might be useful to print out to see why the write is raising an exception, you can start with help(f) or dir(f) and play with it live. (In this case, I'm guessing it's actually the open that fails, not the write, but you shouldn't have to guess.)

On to your second problem:

urllib.urlretrieve() just creates a 1 kb file in the folder, which obviously is not the downloaded file.

Actually, I think it is the downloaded file. You're not asking for AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip, you're asking for AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip.sum, which is probably a checksum file—a quasi-standard type of file that contains metadata that helps you make sure the file you're downloading wasn't damaged in transit or tampered with by a hacker. A typical checksum file has one or more lines, each mapping a downloadable file to a checksum or cryptographic hash digest, in some format, for a downloadable file. Sometimes they have three columns—the type of checksum/hash, the value of the checksum/hash in some stringified format, and the filename or full URL of the file. Sometimes the first column is omitted, and you have to know from elsewhere what type of checksum/hash is being used (often MD5 as a hex string). Sometimes the columns are in different orders. Sometimes they're separated by commas or tabs, or in fixed-width fields, or some other variation.

At any rate, you'd expect a .sum file to be around 80 bytes long. If you look at it in Explorer or the dir command, it'll usually be rounded up to the nearest 1K. So, you should see a 1K file if you download this successfully.

Meanwhile:

print(repr(ul[:60])) is '\n

You should try printing out the rest of this, because it's probably some kind of document explaining, in human terms, what you're doing wrong. This could be because you need to pass a URL agent, a preferred encoding, a referer, or some other header.

However, I tested the exact same line of code you used repeatedly, and ul is always:

1ba6437044bfa9259fa2d3da8f95aebd  AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip

In other words, it's a perfectly valid checksum file, not an HTML page. So, I suspect what's really going on is that you aren't testing the same code you're showing us.

@KarlKnechtel: That should raise a different exception: `IOError: [Errno 21] Is a directory: 'D:/Thesis/test_http_dl'` — abarnert, Dec 20 '12 at 00:29
The folder is not "read-only" and none of the three cases you mentioned is my problem. It's a bit weird. As you said I tried just to open it and I got the same error — f.ashouri, Dec 20 '12 at 00:44
@f.ashouri: Are you sure none of those cases is your problem? What if you just try to create a new text file in the same folder, using Explorer? — abarnert, Dec 20 '12 at 00:47
No, I'm not kidding you. Try creating a new text file in the same folder using Explorer, and again using cmd (e.g., `copy somefile D:/Thesis/test_http_dl`), making sure you're running as the same user that you run the Python script as, and see if you get the same error. — abarnert, Dec 20 '12 at 00:56
@f.ashouri: By the way, do you know what an ACL is, or how to view them? — abarnert, Dec 20 '12 at 01:02
@abarnert: [this question](http://stackoverflow.com/questions/4736616/ioerror-errno-13-permission-denied) demonstrates that "Permission denied" can happen if a folder is used instead of a file. — jfs, Dec 20 '12 at 01:46
@f.ashouri: ACL stands for "access control list". It is one of the things to check if you have permission problems. — jfs, Dec 20 '12 at 01:48
@f.ashouri: If you don't know what ACLs are, and you didn't do the simpler check I suggested in the comments that you seemed to think that was a joke, you haven't checked cases 1 or 2, which are the most likely causes of your problem. However, I forgot about another possible cause, which you should check for—see the edited version. — abarnert, Dec 20 '12 at 02:37
@f.ashouri: Is this your whole program? Is there a chance you've `open`ed the same file earlier and not `close`d it? Because on Windows, that could have the same effect as case 4. Also, is there already a file with that name, or not? — abarnert, Dec 20 '12 at 18:25
@abarnert I found another solution that is working perfectly well. see my answer. — f.ashouri, Dec 21 '12 at 00:44

score 0 · Answer 2 · answered Dec 20 '12 at 00:34

0

i've tried your code and got same error

so try this :D

import urllib
urllib.urlretrieve('http://dds.cr.usgs.gov/emodis/Africa/historical/TERRA/2012/comp_056/AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip.sum','C:\\path_of_your_folder\\xx.zip.sum')

works fine with me !

answered Dec 20 '12 at 00:34

Hamoudaq

1,490
4
23
42

So you just changed it to write to `C:\Python27` instead of `D:\Thesis`? That's not really a "solution". – abarnert Dec 20 '12 at 00:35
sorry i've forget the folder "i was testing in my own machine :D" – Hamoudaq Dec 20 '12 at 00:38
Did you check the file size? It is 94 mb in the web and 1 kb in the folder by your solution :D – f.ashouri Dec 20 '12 at 00:56
1

If you look at [the source to [`urllib.urlretrieve`](http://hg.python.org/cpython/file/b227f8f7242d/Lib/urllib.py), it clearly does an `open(filename, 'wb')`, which means there's no way it could work if just doing `open(filename, 'wb')` directly doesn't work. – abarnert Dec 20 '12 at 00:59
1

i'll try hard to find out an solution – Hamoudaq Dec 20 '12 at 01:09

f.ashouri · Accepted Answer · 2012-12-21T09:48:59.663

0

import urllib2
def download(url, file):
    dataset = urllib2.urlopen(url)
    CHUNK = 16 * 1024
    with open(file, 'wb') as dl:
        while True:
            peice = dataset.read(CHUNK)
            if not peice: break
            dl.write(peice)

download(r'http://dds.cr.usgs.gov/emodis/Africa/historical/TERRA/2012/comp_056/AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip',r'AF_eMTH_NDVI.2012.047-056.QKM.COMPRES.005.2012059143841.zip')

edited Dec 21 '12 at 09:48

answered Dec 20 '12 at 13:55

f.ashouri

5,409
13
44
52

Your code shouldn't do anything different from the original; it's just a less readable and possibly less efficient version of the same thing. If there's a difference, it's probably that `file` isn't a pathname you don't have access to, or that your new code isn't leaking a copy of the same file earlier but your old code was, or something else completely irrelevant to what you changed. – abarnert Dec 21 '12 at 00:52
Also, if the URL is the same one as last time, I'm willing to bet you're getting a 96-byte text file, not a huge binary data file. – abarnert Dec 21 '12 at 00:53
The code is not readable because I'm not good at formatting here (sorry for that). But the code works PERFECTLY WELL and downloads a huge file of 1.5 GB. So, What are you betting on ? – f.ashouri Dec 21 '12 at 01:11
For one thing, I'm betting that the code doesn't use the same value for `url` as the original code did, because the original code had a URL that pointed at a 96-byte text file. And likewise, either `file` is not the same path you used in the original code, or you've fixed whatever problem in the rest of your code and/or environment was causing the exception, because nothing you showed here has any effect on your ability to open the file. – abarnert Dec 21 '12 at 01:22
OK, so that's not the same URL as you originally used—which is why you get a binary file instead of a 96-byte text file—and not the same path as you originally used—which is why you no longer get a permissions error. Go back to your original code and give it the correct URL and path, and it will work too. Your other changes were meaningless obfuscation. You don't program by just changing random things until it works, or looking for a different source to copy and paste from. You have to understand what your code is doing. – abarnert Dec 21 '12 at 19:05
1

The original program wouldn't work anyway. I've tried it a lot. By the way, I don't claim I'm a programmer. I need to do something and I need the code! My limited understanding of programming tells me that the changes are MORE than necessary. – f.ashouri Dec 21 '12 at 19:17

score 0 · Answer 4 · answered Feb 10 '23 at 10:05

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                # If you have chunk encoded response uncomment if
                # and set chunk_size parameter to None.
                #if chunk: 
                f.write(chunk)
    return local_filename

Download a file using Python

4 Answers4