1

I am reading data from image files and I want to append this data into a single HDF file. Here is my code:

datafile = pd.HDFStore(os.path.join(path,'imageData.h5'))
for file in fileList: 
     data = {'X Position' :  pd.Series(xpos, index=index1),
             'Y Position' :  pd.Series(ypos, index=index1),
             'Major Axis Length' :  pd.Series(major, index=index1),
             'Minor Axis Length' :  pd.Series(minor, index=index1), 
             'X Velocity' :  pd.Series(xVelocity, index=index1),
             'Y Velocity' :  pd.Series(yVelocity, index=index1) }
    df = pd.DataFrame(data)
    datafile['df'] = df
    datafile.close()

This is obviously incorrect as it overwrites each set of data with the new one each time the loop runs.

If instead of datafile['df'] = df, I use

datafile.append('df',df)    

OR

df.to_hdf(os.path.join(path,'imageData.h5'), 'df', append=True, format = 'table')

I get the error:

ValueError: Can only append to Tables

I have referred to the documentation and other SO questions, without avail.

So, I am hoping someone can explain why this isn't working and how I can successfully append all the data to one file. I am willing to use a different method (perhaps pyTables) if necessary.

Any help would be greatly appreciated.

Community
  • 1
  • 1
salamander
  • 181
  • 1
  • 3
  • 15
  • The second way (`df.to_hdf(..., format="table", append=True)`) is actually the right one. Have you tried using that (without all of the `HDFStore` stuff) with a fresh file? – filmor Feb 26 '14 at 08:15
  • @filmor You mean remove the line where I create the empty HDF file? Tried that, same error. Maybe the problem is that the data is in a DataFrame and not a table? – salamander Feb 26 '14 at 08:18
  • No, the error message is referring to the internal HDF5 table format that is used. IIRC in older versions of pandas (btw, which one are you using?) `HDFStore` used the `fixed` format by default which doesn't allow appending. The `table` format is the one used by PyTables. – filmor Feb 26 '14 at 08:34
  • Version 0.11.0 - I tried it on a fresh file and it worked, but without the for loop. – salamander Feb 26 '14 at 09:14
  • @filmor So according to what you said, I should I use PyTables instead? Or should the `to_hdf` function allow appending? – salamander Feb 26 '14 at 09:24
  • 1
    If you use `format="table"` `to_hdf` should allow appending by using PyTables internally, no need to do that yourself. You might want to update pandas, though. What does "and it worked" mean? Would you update the question? – filmor Feb 26 '14 at 10:04
  • in 0.11 to_hdf didn't pass thru keywords so that will not work. best to upgrade - current version is 0.13.1 – Jeff Feb 26 '14 at 12:02
  • @Jeff I upgraded to 0.13.0. The `to_hdf` method also works now. – salamander Feb 27 '14 at 02:52

1 Answers1

2

This will work in 0.11. Once you create a group (e.g the label where you are storing data, the 'df' here). If you store a fixed format it will overwrite (and if you try to append will give you the above error msg); if you write a table format you can append. Note that in 0.11, to_hdf does not correctly pass keywords thru to the underlying function so you can use it ONLY to write a fixed format.

datafile = pd.HDFStore(os.path.join(path,'imageData.h5'),mode='w')
for file in fileList: 
     data = {'X Position' :  pd.Series(xpos, index=index1),
             'Y Position' :  pd.Series(ypos, index=index1),
             'Major Axis Length' :  pd.Series(major, index=index1),
             'Minor Axis Length' :  pd.Series(minor, index=index1), 
             'X Velocity' :  pd.Series(xVelocity, index=index1),
             'Y Velocity' :  pd.Series(yVelocity, index=index1) }
    df = pd.DataFrame(data)
    datafile.append('df',df)
datafile.close
Jeff
  • 125,376
  • 21
  • 220
  • 187