Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
1178
votes
16 answers

"Large data" workflows using pandas

I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support. However, SAS is horrible as a piece of software for numerous other…
Zelazny7
  • 39,946
  • 18
  • 70
  • 84
132
votes
12 answers

How to read HDF5 files in Python

I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py, but I cannot figure out how to access data within the file. My code import h5py import numpy as np f1 = h5py.File(file_name,'r+') This works and the…
Sameer Damir
  • 1,454
  • 2
  • 11
  • 8
114
votes
1 answer

Is there an analysis speed or memory usage advantage to using HDF5 for large array storage (instead of flat binary files)?

I am processing large 3D arrays, which I often need to slice in various ways to do a variety of data analysis. A typical "cube" can be ~100GB (and will likely get larger in the future) It seems that the typical recommended file format for large…
Caleb
  • 3,839
  • 7
  • 26
  • 35
74
votes
7 answers

Opinions on NetCDF vs HDF5 for storing scientific data?

Anyone out there have enough experience w/ NetCDF and HDF5 to give some pluses / minuses about them as a way of storing scientific data? I've used HDF5 and would like to read/write via Java but the interface is essentially a wrapper around the C…
Jason S
  • 184,598
  • 164
  • 608
  • 970
73
votes
2 answers

HDF5 - concurrency, compression & I/O performance

I have the following questions about HDF5 performance and concurrency: Does HDF5 support concurrent write access? Concurrency considerations aside, how is HDF5 performance in terms of I/O performance (does compression rates affect the…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
69
votes
9 answers

How is HDF5 different from a folder with files?

I'm working on an open source project dealing with adding metadata to folders. The provided (Python) API lets you browse and access metadata like it was just another folder. Because it is just another folder. \folder\.meta\folder\somedata.json Then…
Marcus Ottosson
  • 3,241
  • 4
  • 28
  • 34
67
votes
2 answers

How to append data to one specific dataset in a hdf5 file with h5py

I am looking for a possibility to append data to an existing dataset inside a .h5 file using Python (h5py). A short intro to my project: I try to train a CNN using medical image data. Because of the huge amount of data and heavy memory usage during…
Midas.Inc
  • 1,730
  • 3
  • 13
  • 25
64
votes
5 answers

How to deal with hdf5 files in R?

I have a file in hdf5 format. I know that it is supposed to be a matrix, but I want to read that matrix in R so that I can study it. I see that there is a h5r package that is supposed to help with this, but I do not see any simple to…
Sam
  • 7,922
  • 16
  • 47
  • 62
53
votes
7 answers

ImportError HDFStore requires PyTables No module named tables

import pandas as pd dfs = pd.HDFStore('xxxxx.h5') throws this error: "ImportError: HDFStore requires PyTables, "No module named tables" problem importing" I tried to install PyTables, which Requires Cython. I have Cython 0.21 installed, but it is…
nikhil sahai
  • 531
  • 1
  • 5
  • 4
52
votes
1 answer

which is faster for load: pickle or hdf5 in python

Given a 1.5 Gb list of pandas dataframes, which format is fastest for loading compressed data: pickle (via cPickle), hdf5, or something else in Python? I only care about fastest speed to load the data into memory I don't care about dumping the…
Jesper - jtk.eth
  • 7,026
  • 11
  • 36
  • 63
47
votes
11 answers

Pandas ParserError EOF character when reading multiple csv files to HDF5

Using Python3, Pandas 0.12 I'm trying to write multiple csv files (total size is 7.9 GB) to a HDF5 store to process later onwards. The csv files contain around a million of rows each, 15 columns and data types are mostly strings, but some floats.…
Matthijs
  • 779
  • 1
  • 8
  • 19
46
votes
2 answers

Experience with using h5py to do analytical work on big data in Python?

I do a lot of statistical work and use Python as my main language. Some of the data sets I work with though can take 20GB of memory, which makes operating on them using in-memory functions in numpy, scipy, and PyIMSL nearly impossible. The…
Josh Hemann
  • 940
  • 10
  • 12
40
votes
2 answers

Evaluating HDF5: What limitations/features does HDF5 provide for modelling data?

We are in evaluating technologies that we'll use to store data that we gather during the analysis of C/C++ code. In the case of C++, the amount of data can be relatively large, ~20Mb per TU. After reading the following SO answer it made me consider…
Richard Corden
  • 21,389
  • 8
  • 58
  • 85
39
votes
1 answer

HDF5 taking more space than CSV?

Consider the following example: Prepare the data: import string import random import pandas as pd matrix = np.random.random((100, 3000)) my_cols = [random.choice(string.ascii_uppercase) for x in range(matrix.shape[1])] mydf = pd.DataFrame(matrix,…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
38
votes
8 answers

Missing optional dependency 'tables'. In pandas to_hdf

following code is giving me error. import pandas as pd df = pd.DataFrame({'a' : [1,2,3]}) df.to_hdf('temp.h5', key='df', mode='w') This is giving me error. Missing optional dependency 'tables'. Use pip or conda to install tables. I already…
Poojan
  • 3,366
  • 2
  • 17
  • 33
1
2 3
99 100