0

I am reading several tab-delimited files from the directory using python's multiprocessing process pool module based on the answer provide here. The answer was working for me until now but suddenly it stopped working. I know this sounds stupid but I have searched for every possible solution and still couldn't figure it out. My code is below:

def reading(path):
    return pd.read_csv(path, sep='\t', header=None,quoting=csv.QUOTE_NONE,encoding='utf-8',
                       converters={13:str})

def main():

    file_list = []

    # set up your pool
    pool = mp.Pool(processes=8) # or whatever your hardware can support

    # get a list of file names
    for root, dirs, files in os.walk('c:/Users/kdalal/contentengine/IngestionTrain/Raw Logs'): 
        for file in files:
            if file.startswith('2') and os.stat(os.path.join(root, file)).st_size != 0:
                print(os.path.join(root, file))
                file_list.append(os.path.join(root, file))                

    # have your pool map the file names to dataframes
    df_list = pool.map(reading,file_list) 
    print("Pooling Done")   

    # reduce the list of dataframes to a single dataframe
    df = pd.concat(df_list, ignore_index=True)  
    return df

if __name__ == '__main__':
    df = main() 

I tested reading by passing the file path and it works. I am really frustrated and I would be very grateful for help. Also. let me know if you want any other detail that would make you assist me better. Thanks again.

Krishnang Dalal
  • 83
  • 1
  • 1
  • 9

0 Answers0