Python: No output from Python Multiprocessing pool map

Question

I am reading several tab-delimited files from the directory using python's multiprocessing process pool module based on the answer provide here. The answer was working for me until now but suddenly it stopped working. I know this sounds stupid but I have searched for every possible solution and still couldn't figure it out. My code is below:

def reading(path):
    return pd.read_csv(path, sep='\t', header=None,quoting=csv.QUOTE_NONE,encoding='utf-8',
                       converters={13:str})

def main():

    file_list = []

    # set up your pool
    pool = mp.Pool(processes=8) # or whatever your hardware can support

    # get a list of file names
    for root, dirs, files in os.walk('c:/Users/kdalal/contentengine/IngestionTrain/Raw Logs'): 
        for file in files:
            if file.startswith('2') and os.stat(os.path.join(root, file)).st_size != 0:
                print(os.path.join(root, file))
                file_list.append(os.path.join(root, file))                

    # have your pool map the file names to dataframes
    df_list = pool.map(reading,file_list) 
    print("Pooling Done")   

    # reduce the list of dataframes to a single dataframe
    df = pd.concat(df_list, ignore_index=True)  
    return df

if __name__ == '__main__':
    df = main()

I tested reading by passing the file path and it works. I am really frustrated and I would be very grateful for help. Also. let me know if you want any other detail that would make you assist me better. Thanks again.

The entire folder is 9.6 gigs which ultimately I'm trying to read. They are all the tab-delimited log files. It is hard to check for individual files since there are a lot of them but by eye-balling it seems that none of the files if bigger than 100mb. — Krishnang Dalal, May 23 '17 at 21:14

Python: No output from Python Multiprocessing pool map

0 Answers0