I am trying to write (append) panda frame into HDF file. I use h5py
library. I drop the duplicates to reduce the size.
TestFrame = TestFrame.drop_duplicates()
print(TestFrame.shape)
print(TestFrame.info())
TestFrame.to_hdf("data.h5", key="dataset_01", mode="a")
The TestFrame.info()
gives the following information:
(202496, 21) #shape of the frame
class 'pandas.core.frame.DataFrame'>
Int64Index: 202496 entries, 0 to 367949
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 202496 non-null object
1 1 202496 non-null object
2 2 202496 non-null object
3 3 202496 non-null object
4 4 202496 non-null object
dtypes: object(5)
memory usage: 17.8+ MB
None
I get the following error:
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\generic.py", line 2490, in to_hdf
pytables.to_hdf(
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\pytables.py", line 282, in to_hdf
f(store)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\pytables.py", line 265, in <lambda>
f = lambda store: store.put(
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\pytables.py", line 1030, in put
self._write_to_group(
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\pytables.py", line 1697, in _write_to_group
s.write(
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\pytables.py", line 3101, in write
self.write_array(f"block{i}_values", blk.values, items=blk_items)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\pytables.py", line 2958, in write_array
vlarr.append(value)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\tables\vlarray.py", line 525, in append
sequence = atom.toarray(sequence)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\tables\atom.py", line 1083, in toarray
buffer_ = self._tobuffer(object_)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\tables\atom.py", line 1216, in _tobuffer
return pickle.dumps(object_, pickle.HIGHEST_PROTOCOL)
MemoryError
I tried using to_csv
but it does not give any error. I want to use HDF5 file format.