I'm writing a crawl where I crawl pages on a website and collects links which I write in a file. I can think of two options as mentioned below. I'm using first method right now which I know is not efficient as there will be file open and close in each loop but it is safe in the sense as it will write into file and if code crashes for some reason I'll still have data in it.
I'm not sure about the 2nd method. What if it crashes and file is not able to be closed properly, will I've have data written on file?
Is there any other more efficient way to achieve this?
I'm only writing the pseudo code.
Method 1: collect all urls on a page and write it in the file, close the file and repeat
def crawl(max_pages):
# do stuff
while(page <= max_pages):
#do stuff
with open(FILE_NAME, 'a') as f:
f.write(profile_url + '\n')
f.close()
Method 2: Keep file opened, collect urls from all pages and close it in the very end
crawl(300)
def crawl(max_pages):
# do stuff
with open(FILE_NAME, 'a') as f:
while(page <= max_pages):
#do stuff
f.write(profile_url + '\n')
f.close()
crawl(300)