saving dask dataframe in bcolz format

Question

The dask documentation states: "BColz is an on-disk, chunked, compressed, column-store. These attributes make it very attractive for dask.dataframe which can operate particularly well on it. There is a special from_bcolz function."

However, I could not find an example how to save a dask dataframe to bcolz. What is the recommended way to do this?

a possible solution might be, to convert the dask dataframe to a dask array as described here (http://stackoverflow.com/q/37444943/5082048) and save it as bcolz as described here (http://dask.pydata.org/en/latest/array-creation.html#store-dask-arrays) — Arco Bast, Jul 12 '16 at 20:09

score 1 · Accepted Answer · answered Jul 18 '16 at 12:54

1

I created a pull request to implement this. Until it is merged into the master branch you can find it here:

https://github.com/dask/dask/pull/1386

If you don't want to edit your own Dask implementation you can just copy the to_bcolz method.

answered Jul 18 '16 at 12:54

Simon Kamronn

198
6

saving dask dataframe in bcolz format

1 Answers1