1

I needed to download a file within a python program, someone told me to do this.

source = urllib2.urlopen("http://someUrl.com/somePage.html").read()
open("/path/to/someFile", "wb").write(source)

It working very well, but I would like to understand the code.

When you have something like

patatoe = 1

Isn't a variable?

and when you have a something like:

blabla()

isn't to define a function?

Please, I would LOVE to understand correctly the code.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
user1044824
  • 57
  • 1
  • 5
  • 2
    I recognize that code (http://stackoverflow.com/questions/8116623/how-to-download-a-file-in-python) =) – chown Nov 18 '11 at 21:41

4 Answers4

2

The word "source" is a variable. When you call urllib2's urlopen method and pass it a URL, it will open that url. You could then type "source.read()" to read the web page (i.e. download it). In your example, it's combined into one line. See http://docs.python.org/library/urllib2.html

The second piece opens a file. The first argument is the path to the file. The "wb" part means that it will write in binary mode. If the file already exists, it will be overwritten. Normally, I would write it like this:

f = open("/path/to/someFile", "wb")
f.write(source)
f.close()

The way you're doing it is a shortcut. When that code is run and the script ends, the file is closed automatically. See also http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files

Mike Driscoll
  • 32,629
  • 8
  • 45
  • 88
1

You define a function using the def keyword:

def f():
    ...

Without it, you are simply calling the function. open(...) returns a file object. which you then use to write the data out. It's practically the same as this:

f = open(...)
f.write(source)

It isn't quite the same, though, since the variable f holds onto the file object until it goes out of scope, whereas calling open(...).write(source) creates a temporary reference to the file object that disappears immediately after write() returns. The consequence of this is that the single-line form will immediately flush and close the file, while the two-line form wil keep the file open — and possibly some or all of the output buffered — until f goes out of scope.

You can observe this behaviour in the interactive shell:

>>> f = open('xxx', 'w')
>>> f.write('hello')
>>> open('yyy', 'w').write('world')

Now, without exiting the interactive shell, open another terminal window and check the contents of xxx and yyy. They'll both exist, but only yyy will have anything in it. Also, if you go back to Python and invoke f = None or del f, you'll find that xxx has now been written to.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
1

The first line is assigning the result of downloading the file to the variable source. source is then written to disk.

To answer your broader points:

  • You're right that variables are assigned with an equals sign (=). What we're doing in that first line is assigning the variable source to whatever we receive from the URL.
  • Parentheses (()) are used to call functions which have been defined by def. To call a function means to ask the function to act. The things inside of the parentheses are called arguments.

You should start with Learn Python the Hard Way to get an understanding of what is happening.

Tim McNamara
  • 18,019
  • 4
  • 52
  • 83
0

Here's a (hopefully understandable) explanation of the code I showed you the other day (How to download a file in python - feel free to comment here or on that question if you need any more details / explanation):

# Open a local file called "someFile.html" for writing (like opening notepad.exe but not entering text yet)
out_file = open("/path/to/someFile.html", "wb")

# Connect to the server at someUrl.com and ask for "somePage.html" - the socket sends the "GET /somePage.html HTTP/1.1" request.
#  This is like typing the url in your browser window and (if there were an option for it) only getting the headers but not the page content yet.
conn = urllib2.urlopen("http://someUrl.com/somePage.html")

# Read the contents of the remote file "somePage.html".  
#  This is what actually gets data from the web server and 
#  saves the data into the 'pageSource' variable.
pageSource = conn.read()

# Write the data we got from the web page to our local file that we opened earlier: 'someFile.html'
out_file.write(pageSource)
Community
  • 1
  • 1
chown
  • 51,908
  • 16
  • 134
  • 170