4

I'm building a desktop application for Windows in Python 2.7. The primary function of this application is to watch a folder for new files. Whenever a new file appears in this folder the app uploads it to remote server. The process on the remote server creates a db record for the file and stores remote file path in that record.

Currently I'm using watchdog to monitor directory and httplib for file upload.

  1. What approach should I take to ensure that a new file will be uploaded reliably regardless of a network condition or internet connection loss?

    Update: What I mean by reliable upload is that the app will upload the file even if the app restarts. Like Dropbox. Some files are quite big (> 100 MB) so simple solutions like wrapping the code in try / catch and starting the upload all over is not very efficient. I know Dropbox uses librsync, but it might be overkill in this case.

  2. What if the source file has been changed during the upload? Should I stop the upload and start over?

Kara
  • 6,115
  • 16
  • 50
  • 57
Warwick
  • 1,200
  • 12
  • 22
  • For your first question, I think you can catch any exception and retry; using a `try except` statement inside of a `while True`. – Zeinab Abbasimazar Apr 19 '14 at 10:05
  • @ZeinabAbbasi I think I've missed my point a little bit. What I mean by saying "reliably" is more than just retrying in `while True` loop. I need the app to upload those files even if the program was restarted. Even if the PC was restarted. I'm sure Dropbox does it somehow. I don't know what is best practices here but I think it is some sort of keeping a queue of files to upload or using `librsync` after each failure to skip reuploading same file over and over again (a file could be big). I'll update the question. – Warwick Apr 19 '14 at 21:27

1 Answers1

1

You could maintain file or database of files names, timestamps and information about their upload status. Based on that data You will know what files were already sent and what to upload after any restart of application or computer.

Checking timestamps tells You that file has been modified and upload process should be started over.

  • Good idea! I think that's what Dropbox is doing: keeping the file list in a database. How would you architect this app in terms of multitasking? Right now I'm doing everything in one process (or thread as I'm using `watchdog`). – Warwick Apr 26 '14 at 12:05
  • 1
    @Warwick You should definitely use thread pool to control simultaneous uploads [look here](http://stackoverflow.com/questions/3033952/python-thread-pool-similar-to-the-multiprocessing-pool). I would do it something like 1. Add data about files upload requests to database 2. Add those requests to thread pool 3. On any single upload success update appropriate database row 4. On any single upload failure add upload request to thread pool again 5. I would have another thread to periodically check database not succeeded request and add them to thread pool again. Post another question maybe :) – Rafał Spryszyński Apr 28 '14 at 09:32