I am reading several tab-delimited files from the directory using python's multiprocessing process pool module based on the answer provide here. The answer was working for me until now but suddenly it stopped working. I know this sounds stupid but I have searched for every possible solution and still couldn't figure it out. My code is below:
def reading(path):
return pd.read_csv(path, sep='\t', header=None,quoting=csv.QUOTE_NONE,encoding='utf-8',
converters={13:str})
def main():
file_list = []
# set up your pool
pool = mp.Pool(processes=8) # or whatever your hardware can support
# get a list of file names
for root, dirs, files in os.walk('c:/Users/kdalal/contentengine/IngestionTrain/Raw Logs'):
for file in files:
if file.startswith('2') and os.stat(os.path.join(root, file)).st_size != 0:
print(os.path.join(root, file))
file_list.append(os.path.join(root, file))
# have your pool map the file names to dataframes
df_list = pool.map(reading,file_list)
print("Pooling Done")
# reduce the list of dataframes to a single dataframe
df = pd.concat(df_list, ignore_index=True)
return df
if __name__ == '__main__':
df = main()
I tested reading
by passing the file path and it works. I am really frustrated and I would be very grateful for help. Also. let me know if you want any other detail that would make you assist me better. Thanks again.