When I create a pandas DataFrame with a column that contains a string and save it as HDF5 the file size seems extremely large. The following code processes a file with a size of 1’063’224 Bytes.
from pathlib import Path
import pandas as pd
data_frame = pd.DataFrame({'foo': ['bar']})
data_frame.to_hdf(Path.home() / 'Desktop' / 'file.hdf5', key='baz', mode='w')
When I replace the 'bar'
with a 1
the file size shrinks down to 7’032 Bytes, which seems (more) reasonable.
Does anyone know where that megabyte of data comes from?