-1

I want to convert csv file with larger size into hdf5 format. I am working with vaex library and it only accepts hdf5 extension to load the dataset. I also need the solution for the above problem in R.

1 Answers1

1

In python you can simply:

pd.read_csv('data.csv').to_hdf('data.h5')

You should have at least 20GB of RAM to load the CSV file.

Vaex doesn't support csv file?

https://vaex.io/docs/example_io.html#Text-based-file-formats

Can you try this code:

for i, chunk in enumerate(vaex.read_csv('/path/to/data/BigData.csv', chunksize=100_000)):
    df_chunk = vaex.from_pandas(chunk, copy_index=False)
    export_path = f'/path/to/data/part_{i}.hdf5'
    df_chunk.export_hdf5(export_path)

df = vaex.open('/path/to/data/part*')
df.export_hdf5('/path/to/data/Final.hdf5')

Source: https://www.programmersought.com/article/95165112668/

Corralien
  • 109,409
  • 8
  • 28
  • 52
  • The above code giving following error: 'TextFileReader' object has no attribute 'columns'. It is due to the vaex.read_csv(). When I replaced veax.read_csv() with pd.read_csv(), it is working. – Rajesh Ahir Aug 02 '21 at 02:55