python - How do I properly use asyncio and read csv with pandas

Question

I have many csv files in the path , and I hope to use pandas read_csv to read , then using pandas.concat to merge all return dataframe ,

but I think I do not use asyncio properly , because consumption of time did not shorten.

import asyncio
import time
import pandas as pd
import glob2
import os

async def read_csv(filename):
    df = pd.read_csv(filename, header=None)
    return df
t = time.time()
path = r'C:\LRM_STGY_REPO\IB_IN'

tasks = [asyncio.ensure_future(read_csv(i)) for i in list(glob2.iglob(os.path.join(path, "*.txt")))]

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

df = pd.concat([t.result() for t in tasks],ignore_index=True)
# print(df)
print( '%.4f' %(time.time()-t))

t = time.time()
def read_csv2(filename):
    return pd.read_csv(filename, header=None)
df = pd.concat(map(read_csv2,glob2.iglob(os.path.join(path, "*.txt"))),ignore_index=True)
# print(df)
print( '%.4f' %(time.time()-t))

read_csv and read_csv2 have similar consumption time.

Or there are other ways to reduce the concat time .

Just out of curiosity, what if you do something like `results=[t.result() for t in tasks]` and then `df = pd.concat(results, ignore_index=True)`? — Ignacio Vergara Kausel, Jan 24 '18 at 09:40
@RelaxZeroC I mean to explicitly build a `results` list and then pass it to `pd.concat()`. If this doesn't work, then perhaps you could go along the lines of this answer https://stackoverflow.com/questions/48404125/how-to-parallelize-csv-files-processing/48411811#48411811 and use the multiprocessing module. — Ignacio Vergara Kausel, Jan 24 '18 at 09:47
As far as I know, `pd.read_...` will always block. So `asyncio` can not really help — James Schinner, Jan 24 '18 at 10:04
Dask dataframes might help you. [Stackoverflow reference](https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe) — Sergio Lucero, Oct 08 '19 at 00:45

python - How do I properly use asyncio and read csv with pandas

0 Answers0