0

I have an hdf5 database with 3 keys (features, image_ids, index). The image_ids and index each have 1000 entries.

The problem is, while I can get the 10th image_ids via:

dbhdf5 ["image_ids"][10]
>>> u'image001.jpg'

I want to do the reverse, i.e. find the index by passing the image name. Something like:

dbhdf5 ["image_ids"="image001.jpg"]
or 
dbhdf5 ["image_ids"]["image001.jpg"]
or
dbhdf5 ['index']['image001.jpg']

I've tried every variation I can think of, but can't seem to find a way to retrieve the index of an image, given it's id. I get errors like 'Field name only allowed for compound types'

hpaulj
  • 221,503
  • 14
  • 230
  • 353
Ben Nguyen
  • 33
  • 6
  • Please provide more detail about how the hdf5 table is stored, and what packages you are using to access it. Answers to this question [http://stackoverflow.com/questions/1686869/searching-a-hdf5-dataset] suggests that HDF5 is not directly searchable. Maybe you would be better off using SQLite for storage and retrieval? – Neapolitan Nov 18 '16 at 05:43

1 Answers1

0

What you are trying is not possible. HDF5 works by storing arrays, that are accessed via numerical indices.

Supposing that you also manage the creation of the file, you can store your data in separate named arrays:

\index
   \-- image001.jpg
   \-- image002.jpg
   ...
\features
   \-- image001.jpg
   \-- image002.jpg
   ...

So you can access them via names:

dbhdf5['features']['image001.jpg']

If the files are generated by someone else, you have to store the keys yourself, for instance with a dict:

lookup = {}
for i, key in enumerate(dbhdf5['image_ids'][:]):
    lookup[key] = i

and access them via this indirection

dbhdf5['index'][lookup['image001.jpg']]
Pierre de Buyl
  • 7,074
  • 2
  • 16
  • 22