Questions tagged [hdf]

Hierarchical Data Format (HDF, HDF4, or HDF5) is a set of file formats and libraries designed to store and organize large amounts of numerical data.

Hierarchical Data Format (HDF, HDF4, or HDF5) is a set of file formats and libraries designed to store and organize large amounts of numerical data.

Originally developed at the National Center for Supercomputing Applications, it is supported by the non-profit HDF Group, whose mission is to ensure continued development of HDF5 technologies, and the continued accessibility of data stored in HDF.

In keeping with this goal, the HDF format, libraries and associated tools are available under a liberal, BSD-like license for general use. HDF is supported by many commercial and non-commercial software platforms, including Java, MATLAB/Scilab, Octave, IDL, Python, and R. The freely available HDF distribution consists of the library, command-line utilities, test suite source, Java interface, and the Java-based HDF Viewer (HDFView).

There are two major versions of HDF; HDF4 and HDF5, which differ significantly in design and API.

Wikipedia: http://en.wikipedia.org/wiki/Hierarchical_Data_Format

344 questions
26
votes
3 answers

pd.read_hdf throws 'cannot set WRITABLE flag to True of this array'

When running pd.read_hdf('myfile.h5') I get the following traceback error: [[...some longer traceback]] ~/.local/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop) 2487 2488 if …
Landmaster
  • 1,043
  • 2
  • 13
  • 21
18
votes
5 answers

Reading hdf files into R and converting them to geoTIFF rasters

I'm trying to read MODIS 17 data files into R, manipulate them (cropping etc.) and then save them as geoTIFF's. The data files come in .hdf format and there doesn't seem to be an easy way to read them into R. Compared to other topics there isn't a…
James
  • 1,164
  • 2
  • 15
  • 36
17
votes
2 answers

OverflowError while saving large Pandas df to hdf

I have a large Pandas dataframe (~15GB, 83m rows) that I am interested in saving as an h5 (or feather) file. One column contains long ID strings of numbers, which should have string/object type. But even when I ensure that pandas parses all columns…
Josh Friedlander
  • 10,870
  • 5
  • 35
  • 75
12
votes
2 answers

Converting HDF5 to Parquet without loading into memory

I have a large dataset (~600 GB) stored as HDF5 format. As this is too large to fit in memory, I would like to convert this to Parquet format and use pySpark to perform some basic data preprocessing (normalization, finding correlation matrices,…
Eweler
  • 407
  • 5
  • 14
10
votes
1 answer

return a list of all datasets in a hdf file with pandas

This may be a stupid question, but i have yet to find an answer in the pandas docs or elsewhere. The same question has been asked before here. But the only answer was to look at the pandas docs, which as I stated don't provide an answer to this…
Grr
  • 15,553
  • 7
  • 65
  • 85
8
votes
1 answer

HDF5 possible data corruption or loss?

On wikipedia one can read the following criticism about HDF5: Criticism of HDF5 follows from its monolithic design and lengthy specification. Though a 150-page open standard, there is only a single C implementation of HDF5, meaning all bindings…
daniel451
  • 10,626
  • 19
  • 67
  • 125
7
votes
1 answer

How can I check that a file is a valid HDF5 file?

Based on the example given here, I have a file image loaded into memory as a string with a valid handler. This was done using H5LTopen_file_image(). How can I check that a file is a valid HDF5 file? I found only a program called H5check, which has…
The Quantum Physicist
  • 24,987
  • 19
  • 103
  • 189
7
votes
3 answers

How to read HDF5 files that have only datasets (no groups) using h5py?

I have HDF5 files that I would like to open using the Python module h5py (in Python 2.7). This is easy when I have a file with groups and datasets: import h5py as hdf with hdf.File(relative_path_to_file, 'r') as f: my_data =…
Joshua Zollweg
  • 174
  • 1
  • 1
  • 8
6
votes
1 answer

Read HDF5 in streaming in java

I want to read some HDF5 stream of several gigabytes. I also want to stay in native java for portability. I have tried Java HDF Object Package and Java HDF5 Interface (JHI5) but theses are some JNI solutions (that I might reconsider if i don't find…
bloub
  • 510
  • 1
  • 4
  • 21
6
votes
1 answer

Pandas read_hdf very slow for non-numeric data

When reading a large hdf file with pandas.read_hdf() I get extremely slow read time. My hdf has 50 million rows, 3 columns with integers and 2 with strings. Writing this using to_hdf() with table format and indexing took almost 10 minutes. While…
kayoz
  • 1,104
  • 12
  • 16
6
votes
2 answers

Write pandas DataFrame to HDF in memory buffer

I want to get a dataframe as hdf in memory. The code below results in "AttributeError: '_io.BytesIO' object has no attribute 'put'". I am using python 3.5 and pandas 0.17 import pandas as pd import numpy as np import io df =…
user2133814
  • 2,431
  • 1
  • 24
  • 34
6
votes
1 answer

Link external raw file to hdf5 file with h5py

I work a lot with binary flat files and they need to remain in their current format to work with legacy codes, however I would also like to be able to use some of the features of HDF5 files with the attributes and groups. I see in the HDF5…
Craig
  • 901
  • 7
  • 18
5
votes
0 answers

Saving pandas DataFrame with nullable integer data type to HDF file (format='table')

How can one save pandas DataFrame with nullable integer data type to an HDF file in the 'table' format? # input data import pandas as pd, numpy as np df = pd.DataFrame(index=list(range(2)), data={'x':[np.uint8(1)]*2},…
S.V
  • 2,149
  • 2
  • 18
  • 41
5
votes
1 answer

reading HDF4 file with python - more than one dataset with same name

I have a HDF4 file I need to read with python. For this I use pyhdf. In most cases I am quite happy to use SD class to open the file: import pyhdf.SD as SD hdf = SD.SD(hdfFile) and then continue with v1 = hdf.select('Data set 1') v2 =…
red_tiger
  • 1,402
  • 3
  • 16
  • 32
5
votes
1 answer

Combined hdf5 files into single dataset

I have many hdf5 files each with a single dataset on them. I want to combine them into one dataset where the data is all in the same volume (each file is an image, I want one large timelapse image). I wrote a python script to extract the data as a…
1
2 3
22 23