I am trying to read a single large parquet
file (size > gpu_size), using dask_cudf
/dask
but it is currently reading it into a single partition, which i am guessing is the expected behavior inferring from the doc-string:
dask.dataframe.read_parquet(path, columns=None, filters=None, categories=None, index=None, storage_options=None, engine='auto', gather_statistics=None, **kwargs):
Read a Parquet file into a Dask DataFrame
This reads a directory of Parquet data into a Dask.dataframe, one file per partition.
It selects the index among the sorted columns if any exist.
Is there a work-around i can do read it into multiple partitions ?