Only way I could think of only one way of solving this problem but it has some listed limitations. Can somebody suggest of another way of solving this problem?
we have given a text file with 999999 URLs. we have to write a python program to read this file & save all the webpages in a folder called 'saved_page'.
I have tried to solve this problem something like this,
import os
import urllib
save_path='C:/test/home_page/'
Name = os.path.join(save_path, "test.txt")
file = open('soop.txt', 'r')
'''all the urls are in the soop.txt file '''
for line in file:
data = urllib.urlopen(line)
for line in data:
f=open(Name)
lines=f.readlines()
f.close()
lines.append(line)
f=open(Name,"w")
f.writelines(lines)
f.close()
file.close()
Here are some limitations with this code,
1).If network goes down, this code will restart.
2).If it comes across a bad URL - i.e. server doesn't respond - this code will be stuck.
3).I am currently downloading in sequence - this will be quite slow for large no of URLS.
So can somebody suggest a solution that would address these problems as well?