I've done some research on progress bars in Python, and a lot of the solutions seem to be based on work being divided into known, discrete chunks. I.e., iterating a known number of times and updating the progress bar with stdout
every time a percentage point of the progress toward the end of the iterations is made.
My problem is a little less discrete. It involves walking a user directory that contains hundreds of sub-directories, gathering MP3 information, and entering it into a database. I could probably count the number of MP3 files in the directory before iteration and use that as a guideline for discrete chunks, but many of the mp3s may already be in the database, some of the files will take longer to read than others, errors will occur and have to be handled in some cases, etc. Besides, I'd like to know how to pull this off with non-discrete chunks for future reference. Here is the code for my directory-walk/database-update, if you're interested:
import mutagen
import sys
import os
import sqlite3 as lite
for root, dirs, files in os.walk(startDir):
for file in files:
if isMP3(file):
fullPath = os.path.join(root, file)
# Check if path already exists in DB, skip iteration if so
if unicode(fullPath, errors="replace") in pathDict:
continue
try:
audio = MP3(fullPath)
except mutagen.mp3.HeaderNotFoundError: # Invalid file/ID3 info
#TODO: log for user to look up what files were not visitable
continue
# Do database operations and error handling therein.
Is threading the best way to approach something like this? And if so, are there any good examples on how threading achieves this? I don't want a module for this because (a) it seems like something I should know how to do and (b) I'm developing for a dependency-lite situation.