26

When running

pd.read_hdf('myfile.h5')

I get the following traceback error:

[[...some longer traceback]]

~/.local/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop) 2487 2488 if isinstance(node, tables.VLArray): -> 2489 ret = node[0][start:stop] 2490 else: 2491 dtype = getattr(attrs, 'value_type', None)

~/.local/lib/python3.6/site-packages/tables/vlarray.py in getitem(self, key)

~/.local/lib/python3.6/site-packages/tables/vlarray.py in read(self, start, stop, step)

tables/hdf5extension.pyx in tables.hdf5extension.VLArray._read_array()

ValueError: cannot set WRITEABLE flag to True of this array

No clue what's going on. I've tried reinstalling tables, pandas everything basically, but doesn't want to read it.

jpobst
  • 3,491
  • 2
  • 25
  • 24
Landmaster
  • 1,043
  • 2
  • 13
  • 21
  • Can you open `myfile.h5` with **HDFView** and see the data? Or, have you tried reading the file with the `h5dump` tool? It's a command line utility from the HDF Group. It's another way to see what you have. You can also try the `pytables` command line tool `ptdump`. These may help pinpoint the problem. – kcw78 Jan 16 '19 at 15:00
  • When I open it with `h5py`, I get the key ['pd']. This has 4 keys: `` – Landmaster Jan 17 '19 at 10:41
  • `ptdump` dumps the group structure and dataset formats to the command window (or redirect to a text file). If there are no errors, that's a good start. Did you see some `VLArrays` listed with data? `h5py` is a different method to access HDF5 data. It uses a dictionary like method where the keys are your top level node names and the values are the objects (either a group or a dataset). The goal of this is to verify a valid `HDF5` file so you can focus on your code. – kcw78 Jan 17 '19 at 13:50

3 Answers3

21

Are you using numpy 1.16? It is incompatible with the latest release of pytables (see https://github.com/PyTables/PyTables/blob/v3.4.4/tables/hdf5extension.pyx#L2155) but the pytables team have not yet released a fixed version: https://github.com/PyTables/PyTables/issues/719

The only way I found to fix this is to downgrade numpy.

Eddie Bell
  • 226
  • 2
  • 2
  • The issue persists for me with numpy ```1.15``` and python 3.6. The issue goes away with python 3.5 or 3.7. I am using an anaconda environment. with ```conda-forge``` channel. – Fauzan Jan 22 '19 at 09:42
  • This error disappeared with numpy 1.15. But new error comes with 'No module named 'numpy.core._multiarray_umath' – Spencer Jan 23 '19 at 08:03
  • @YueDeng this is likely because the hdf-file was saved with numpy 1.16. In this case it won't work even if you downgrade. Try to downgrade numpy and save the hdf file again with 1.15. Then it will work. – Pekka Jan 26 '19 at 12:08
  • And downgrade to Python 3.5 – Afe Feb 25 '19 at 22:29
  • 3
    I am using `Python 3.6` and I upgraded `pytables` to `3.5.1` to get things working; didn't have to re-write the HDF archive – jeschwar May 16 '19 at 21:12
18

Upgrading PyTables to version > 3.5.1 should solve this.

pip install --upgrade tables
rkellerm
  • 5,362
  • 8
  • 58
  • 95
0

It seems that time-date strings were causing the problem and when I converted these from text to numpy (pd.to_datetime()) and stored the table and the problem went away so perhaps it has something to do with text data?

double-beep
  • 5,031
  • 17
  • 33
  • 41