0

As the title says - is it possible to write an asyncio event loop that will slice DataFrame by unique values in a certain column and save it on my drive? And maybe more importantly - is it faster?

What I've tried is something like this:

async def a_split(dist,df):
    temp_df = df[df.district == dist]
    await temp_df.to_csv('{}.csv'.format(d))

async def m_lp(df):
    for dist in df.district.unique().tolist():
        await async_slice(dist,df)

loop = asyncio.get_event_loop()

loop.run_until_complete(m_lp(dfTotal))  
loop.close() 

But I'm getting a following error:

TypeError: object NoneType can't be used in 'await' expression

If it's not obvious from my attempt, I'm very new to asyncio and I'm not sure how it works. Apologies if this is a stupid question.

If asyncio is not a good tool for the job - is there a better one?

Edit:

Full traceback below:

    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-2bc2373d2920> in <module>()
      2 loop = asyncio.get_event_loop()
      3 
----> 4 loop.run_until_complete(m_lp(dfTotal))
      5 loop.close()

C:\Users\5157213\AppData\Local\Continuum\Anaconda3\envs\python36\lib\asyncio\base_events.py in run_until_complete(self, future)
    464             raise RuntimeError('Event loop stopped before Future completed.')
    465 
--> 466         return future.result()
    467 
    468     def stop(self):

<ipython-input-20-9e91c0b1b06f> in m_lp(df)
      1 async def m_lp(df):
      2     for dist in df.district.unique().tolist():
----> 3         await a_split(dist,df)

<ipython-input-18-200b08417159> in a_split(dist, df)
      1 async def a_split(dist,df):
      2     temp = df[df.district == dist]
----> 3     await temp.to_csv('C:/Users/5157213/Desktop/Portfolio/{}.csv'.format(dist))

TypeError: object NoneType can't be used in 'await' expression
Greg
  • 101
  • 1
  • 10
  • Please [edit](https://stackoverflow.com/posts/45154096/edit) the question to include the full traceback. As it stands we can't tell which `await` that refers to. – dirn Jul 17 '17 at 22:23
  • Edited - it looks like it refers to the await next to df.to_csv line, but neither of the await's return anything – Greg Jul 17 '17 at 22:42

1 Answers1

3

As far as I know there is no asyncio support as such in Pandas. I think the single-threaded event-based architecture is not the best tool in the systems where you have a dozens of other options to work with load/large data ie. for a large dataset take a look on dask.

The error you get is because you tried to await function Dataframe.to_csv that does not return Future (or any other awaitable object), but the None.

kwarunek
  • 12,141
  • 4
  • 43
  • 48