0

Can anyone help me on the issue of downloading multiple files? For a while, it will stop me with IOError and told me connection attempt failed. I tried to use time.sleep function to sleep for random seconds but it doesn't help. And when I re-run the code, it starts to download files again. Any solutions?

IO_Error

import urllib
import time
import random

index_list=["index#1","index#2",..."index#n"]

for n in index_list:
    u=urllib.urlopen("url_address"+str(n)+".jpg")
    data=u.read()   
    f=open("tm"+str(n)+".jpg","wb")
    f.write(data)
    t=random.uniform(0,1)*10
    print "system sleep time is ", t, " seconds"
    time.sleep(t)
moiaussi06
  • 221
  • 3
  • 11

2 Answers2

0

Maybe you are not closing the connections properly, so the server sees too many open connections? Try to do a u.close() after reading the data in the loop.

Balint Domokos
  • 1,021
  • 8
  • 12
  • The problem is gone after I re-boot my laptop. Guess I just kept my laptop open for too long for it to operate properly. – moiaussi06 Mar 09 '15 at 05:16
0

It is very likely that the error is caused by not closing the connection properly (should I call close() after urllib.urlopen()?). It also is better practice to close, therefore you should close f as well. You could also use Python's with statement.

import urllib
import time
import random

index_list = ["index#1", "index#2", ..."index#n"]

for n in index_list:
    # The str() function call isn't necessary, since it's a list of strings
    u = urllib.urlopen("url_address" + n + ".jpg")
    data = u.read()
    u.close()
    with open("tm" + n + ".jpg", "wb") as f:
        f.write(data)
    t = random.uniform(0, 1) * 10
    print "system sleep time is ", t, " seconds"
    time.sleep(t)

If the problem still occurs and you can't provide further information, you may try urllib.urlretrieve

Liblor
  • 480
  • 5
  • 13
  • Can I follow up with the question about difference between function urllib.urlopen() and urllib.urlretrieve() in this case. Are they inter-changeable? – moiaussi06 Mar 09 '15 at 05:19
  • They are pretty similar. `urlopen(...)` returns a read-only file-like object, while `urlretrieve(...)` returns a tuple `(filename, headers)`. They can be used inter-changeable as far as I know, but I'd say that `urlretrieve(...)` is rather for "downloading" and writing to the disk, and `urlopen(...)` for reading content. – Liblor Mar 09 '15 at 13:30