I want to convert csv file with larger size into hdf5 format. I am working with vaex library and it only accepts hdf5 extension to load the dataset. I also need the solution for the above problem in R.
Asked
Active
Viewed 784 times
1 Answers
1
In python you can simply:
pd.read_csv('data.csv').to_hdf('data.h5')
You should have at least 20GB of RAM to load the CSV file.
Vaex
doesn't support csv file?
https://vaex.io/docs/example_io.html#Text-based-file-formats
Can you try this code:
for i, chunk in enumerate(vaex.read_csv('/path/to/data/BigData.csv', chunksize=100_000)):
df_chunk = vaex.from_pandas(chunk, copy_index=False)
export_path = f'/path/to/data/part_{i}.hdf5'
df_chunk.export_hdf5(export_path)
df = vaex.open('/path/to/data/part*')
df.export_hdf5('/path/to/data/Final.hdf5')
Source: https://www.programmersought.com/article/95165112668/

Corralien
- 109,409
- 8
- 28
- 52
-
The above code giving following error: 'TextFileReader' object has no attribute 'columns'. It is due to the vaex.read_csv(). When I replaced veax.read_csv() with pd.read_csv(), it is working. – Rajesh Ahir Aug 02 '21 at 02:55