5

My file named as 'blueberry.jpg' begins downloading, when I click on the following url manually provided that the username and password are typed when asked: http://example.com/blueberry/download

How can I make that happen using Python?

import urllib.request

url = 'http://example.com/blueberry/download'

data = urllib.request.urlopen(url).read()

fo = open('E:\\quail\\' + url.split('/')[1] + '.jpg', 'w')
print (data, file = fo)

fo.close()

However above program does not write the required file, how can I provide the required username and password?

  • 1
    What type of authorization scheme does it use? Basic, Kerberos, NTLM??If it is basic you can try using http://username:password@example.com/download – MinimalMaximizer Mar 19 '14 at 05:07
  • not sure, how can i know that authorization scheme? –  Mar 19 '14 at 05:09
  • 2
    You'll have to look at the headers returned, take a look at this article http://en.wikipedia.org/wiki/List_of_HTTP_header_fields – MinimalMaximizer Mar 19 '14 at 05:12
  • 2
    If you are hell bent on learning/seeing alot more of the gnitty-gritty check out http://www.voidspace.org.uk/python/articles/authentication.shtml#id1 – MinimalMaximizer Mar 19 '14 at 05:19
  • 1
    I'm not seeing why @burhan's solution doesn't work - it looks fine to me. When you navigate to http://example.com/blueberry/download are you asked to enter your credentials in a pop-up window, or on a form in the actual webpage? – MinimalMaximizer Mar 19 '14 at 05:42
  • in python 2.7 @burhan's solution wrote a file but not the required jpg file because the known url is only 'example.com/blueberry/download' –  Mar 19 '14 at 05:44
  • I hope you are not using the http://example url in your code and using the actual url when you make the request... – MinimalMaximizer Mar 19 '14 at 05:46
  • yes, when i click the actual url in the browser it starts downloading given that username and password are manually provided but the real download link only seen in the browser history is completely different generated randomly than actual url. But the ultimate file name is what I have mentioned in my code above. –  Mar 19 '14 at 05:51
  • OK - by "manually provided" do you mean - "in a pop-up window, or on a form in the actual webpage?" – MinimalMaximizer Mar 19 '14 at 05:53
  • in the actual webpage –  Mar 19 '14 at 05:54
  • it automatically directs into the login page with very random url –  Mar 19 '14 at 05:56
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/49995/discussion-between-minimalmaximizer-and-julie) – MinimalMaximizer Mar 19 '14 at 05:56

2 Answers2

7

Use requests, which provides a friendlier interface to the various url libraries in Python:

import os
import requests

from urlparse import urlparse

username = 'foo'
password = 'sekret'

url = 'http://example.com/blueberry/download/somefile.jpg'
filename = os.path.basename(urlparse(url).path)

r = requests.get(url, auth=(username,password))

if r.status_code == 200:
   with open(filename, 'wb') as out:
      for bits in r.iter_content():
          out.write(bits)

UPDATE: For Python3 get urlparse with: from urllib.parse import urlparse

sniperd
  • 5,124
  • 6
  • 28
  • 44
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • got TypeError: 'str' does not support the buffer interface –  Mar 19 '14 at 05:30
  • for python 3, i am using the code as 'code' fo = open(filename, 'wb') for bits in r.iter_content(): print (bits, file = fo) fo.close() –  Mar 19 '14 at 05:33
  • in python 2.7 it wrote a file but not the required jpg file because the known url is only 'http://example.com/blueberry/download' –  Mar 19 '14 at 05:43
  • Simply replace `filename` with whatever you want the filename to be. – Burhan Khalid Mar 19 '14 at 06:07
  • 2
    For Python3 get urlparse with: `from urllib.parse import urlparse` – SurpriseDog Jan 22 '20 at 23:34
-1

I'm willing to bet you are using basic auth. So try doing the following:

import urllib.request

url = 'http://username:pwd@example.com/blueberry/download'

data = urllib.request.urlopen(url).read()

fo = open('E:\\quail\\' + url.split('/')[1] + '.jpg', 'w')
print (data, file = fo)

fo.close()

Let me know if this works.

MinimalMaximizer
  • 392
  • 1
  • 4
  • 18
  • i got the error perhaps my password is combination of special characters like $/_. The error message tells:http.client.InvalidURL: nonnumeric port: –  Mar 19 '14 at 05:27
  • 1
    As much as the solution i provided is simple - @Burhan Khalid's is much better. It can handle potential error messages in a clean way. – MinimalMaximizer Mar 19 '14 at 05:28
  • 1
    `urlopen` does *not* appear to parse any username and password in front of the hostname like `http://user:pwd@foo.com` - it sees the colon between `user:pwd` and attempts to parse out a port. – thom_nic Dec 19 '19 at 16:12