4

I can write a massive dask data frame to disk like so:

raw_data.to_csv(r'C:\Bla\SubFolder\*.csv')

This produces chunked data of the original (massaged) dataset in the subfolder:

C:\Bla\SubFolder\

Just wondering, can I force dask to write the data as one file?

jpp
  • 159,742
  • 34
  • 281
  • 339
cs0815
  • 16,751
  • 45
  • 136
  • 299
  • 2
    Possible duplicate of [Writing Dask partitions into single file](https://stackoverflow.com/questions/39566809/writing-dask-partitions-into-single-file) – MRocklin Aug 09 '18 at 14:00
  • @MRocklin thanks but is this really a solution? write everything in chunks and then put it all together again? – cs0815 Aug 09 '18 at 15:05

1 Answers1

1

To save everything into a single file one needs to pass single_file=True as

df.to_csv(r'C:\Bla\SubFolder\*.csv', single_file = True)
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83