Here's an idea for how you could use multiprocessing for this.
Constructing a list of files resulting from os.walk is likely to be very fast. It's the processing of those files that's going to take time. With multiprocessing you can do a lot of that work in parallel.
Each process opens the given file, processes it and creates a dataframe. When all of the parallel processing has been carried out you then concatenate the returned dataframes. This last part will be CPU intensive and there's no way (that I can think of) that would allow you to share that load.
from pandas import DataFrame, concat
from os import walk
from os.path import join, expanduser
from multiprocessing import Pool
HOME = expanduser('~')
def process(filename):
try:
with open(filename) as data:
df = DataFrame()
# analyse your data and populate the dataframe here
return df
except Exception:
return DataFrame()
def main():
with Pool() as pool:
filenames = []
for root, _, files in walk(join(HOME, 'Desktop')):
for file in files:
filenames.append(join(root, file))
ar = pool.map_async(process, filenames)
master = concat(ar.get())
print(master)
if __name__ == '__main__':
main()