2

I'm writing some tests and I'm I'm using the Firefox webdriver with a FirefoxProfile to download a file from an external url, but I need to read such file as soon as it finishes downloading to retrieve some specific data.

I set my profile and driver like this:

fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", '/some/path/')
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")

ff = webdriver.Firefox(firefox_profile=fp)

Is there some way to know when the file finishes downloading, so that I know when to call the reader function without having to poll the download directory, waiting with time.sleep or using any Firefox add-on?

Thanks for any help :)

Gerard
  • 9,088
  • 8
  • 37
  • 52
  • I assume this is Linux? You could use inotify to watch the directory and handle the events. But then it would be in a different thread or process. I have example code for that if you want me to post it. – aychedee Jan 07 '13 at 17:07
  • Yes, it's Linux. Could you please post it or leave a link to a gist? Whatever you want. Maybe I can figure something with it :) Thanks – Gerard Jan 07 '13 at 17:20

3 Answers3

1

You could try hooking the file up to a file object as it downloads to use it like a stream buffer, polling it as it downloads to get the data you need, monitoring for the download completion yourself directly (either by waiting for the file to be of the expected size or by assuming it is complete if there has been no new data added for a certain amount of time).

Edit:

You could try to look at the download tracking db in the profile folder as referenced here. Looks like you can wait for your file to have status 1.

Community
  • 1
  • 1
Silas Ray
  • 25,682
  • 5
  • 48
  • 63
  • Sorry but I can't afford to assume things, I need to be certain that the file is completely correct. Anyways, thanks for your help :) – Gerard Jan 08 '13 at 17:45
0

I like to use inotify to watch for these kinds of events. Some example code,

from pyinotify import (
    EventsCodes,
    ProcessEvent,
    Notifier,
    WatchManager,
)

class EventManager(ProcessEvent):

    def process_IN_CLOSE_WRITE(self, event):
        file_path = os.path.join(event.path, event.name)
        # do something to file, you might want to wait a second here and 
        # also test for existence because ff might be making temp files 

wm = WatchManager()
notifier = Notifier(wm, EventManager())
wdd = wm.add_watch('/some/path', EventsCodes.ALL_FLAGS['IN_CLOSE_WRITE'], rec=True)

While True:
    try:
        notifier.process_events()
        if notifier.check_events():
            notifier.read_events()
    except:
        notifier.stop()
        raise

The notifier decides which method to call on the event manager based on the name of the event. So in this case we are only watching for IN_CLOSE_WRITE events

aychedee
  • 24,871
  • 8
  • 79
  • 83
0

It's far from ideal, however with firefox you could check the target folder for the presence of the .part file which is present while it's still downloading (with other browsers you can do something similar). A while loop will then halt everything while waiting for the download to complete:

import os

def test_for_partfile():
    part_file = False
    dir = "C:\\Downloads"
    filelist = (os.listdir(dir))
    for partfile in filelist:
        if partfile.endswith('.part'):
            part_file = True
return part_file

while test_for_partfile():
    time.sleep(15)
nancy
  • 1