I want to create a code that reads several pandas data frames asynchronously, for example from a CSV file (or from a database)
I wrote the following code, assuming that it should import the two data frames faster, however it seems to do it slower:
import timeit
import pandas as pd
import asyncio
train_to_save = pd.DataFrame(data={'feature1': [1, 2, 3],'period': [1, 1, 1]})
test_to_save = pd.DataFrame(data={'feature1': [1, 4, 12],'period': [2, 2, 2]})
train_to_save.to_csv('train.csv')
test_to_save.to_csv('test.csv')
async def run_async_train():
return pd.read_csv('train.csv')
async def run_async_test():
return pd.read_csv('test.csv')
async def run_train_test_asinc():
df = await asyncio.gather(run_async_train(), run_async_test())
return df
start_async = timeit.default_timer()
async_train,async_test=asyncio.run(run_train_test_asinc())
finish_async = timeit.default_timer()
time_to_run_async=finish_async-start_async
start = timeit.default_timer()
train=pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
finish = timeit.default_timer()
time_to_run_without_async = finish - start
print(time_to_run_async<time_to_run_without_async)
Why does it read the two data frames faster in the non-async version?
Just to make it clear, I'm really going to read the data from Bigquery
so im really interested in speeding both requests (train & test) using the code above.
Thanks in advance!