Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions
114
votes
1 answer

Is there an analysis speed or memory usage advantage to using HDF5 for large array storage (instead of flat binary files)?

I am processing large 3D arrays, which I often need to slice in various ways to do a variety of data analysis. A typical "cube" can be ~100GB (and will likely get larger in the future) It seems that the typical recommended file format for large…
Caleb
  • 3,839
  • 7
  • 26
  • 35
39
votes
1 answer

HDF5 taking more space than CSV?

Consider the following example: Prepare the data: import string import random import pandas as pd matrix = np.random.random((100, 3000)) my_cols = [random.choice(string.ascii_uppercase) for x in range(matrix.shape[1])] mydf = pd.DataFrame(matrix,…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
38
votes
8 answers

Missing optional dependency 'tables'. In pandas to_hdf

following code is giving me error. import pandas as pd df = pd.DataFrame({'a' : [1,2,3]}) df.to_hdf('temp.h5', key='df', mode='w') This is giving me error. Missing optional dependency 'tables'. Use pip or conda to install tables. I already…
Poojan
  • 3,366
  • 2
  • 17
  • 33
34
votes
2 answers

Improve pandas (PyTables?) HDF5 table write performance

I've been using pandas for research now for about two months to great effect. With large numbers of medium-sized trace event datasets, pandas + PyTables (the HDF5 interface) does a tremendous job of allowing me to process heterogenous data using all…
Peter Gaultney
  • 3,269
  • 4
  • 16
  • 20
31
votes
3 answers

Convert large csv to hdf5

I have a 100M line csv file (actually many separate csv files) totaling 84GB. I need to convert it to a HDF5 file with a single float dataset. I used h5py in testing without any problems, but now I can't do the final dataset without running out of…
jmilloy
  • 7,875
  • 11
  • 53
  • 86
30
votes
4 answers

install HDF5 and pytables in ubuntu

I am trying to install tables package in Ubuntu 14.04 but sems like it is complaining. I am trying to install it using PyCharm and its package installer, however seems like it is complaining about HDF5 package. However, seems like I cannnot find any…
codeKiller
  • 5,493
  • 17
  • 60
  • 115
26
votes
3 answers

pd.read_hdf throws 'cannot set WRITABLE flag to True of this array'

When running pd.read_hdf('myfile.h5') I get the following traceback error: [[...some longer traceback]] ~/.local/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop) 2487 2488 if …
Landmaster
  • 1,043
  • 2
  • 13
  • 21
26
votes
4 answers

Python, PyTables, Java - tying all together

Question in nutshell What is the best way to get Python and Java to play nice with each other? More detailed explanation I have a somewhat complicated situation. I'll try my best to explain both in pictures and words. Here's the current system…
I82Much
  • 26,901
  • 13
  • 88
  • 119
24
votes
2 answers

Iteratively writing to HDF5 Stores in Pandas

Pandas has the following examples for how to store Series, DataFrames and Panelsin HDF5 files: Prepare some data: In [1142]: store = HDFStore('store.h5') In [1143]: index = date_range('1/1/2000', periods=8) In [1144]: s = Series(randn(5),…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
22
votes
1 answer

Pandas "Group By" Query on Large Data in HDFStore?

I have about 7 million rows in an HDFStore with more than 60 columns. The data is more than I can fit into memory. I'm looking to aggregate the data into groups based on the value of a column "A". The documentation for pandas…
technomalogical
  • 2,982
  • 2
  • 26
  • 43
21
votes
3 answers

Storing numpy sparse matrix in HDF5 (PyTables)

I am having trouble storing a numpy csr_matrix with PyTables. I'm getting this error: TypeError: objects of type ``csr_matrix`` are not supported in this context, sorry; supported objects are: NumPy array, record or scalar; homogeneous list or…
pnsilva
  • 655
  • 1
  • 9
  • 20
19
votes
3 answers

'/' in names in HDF5 files confusion

I am experiencing some really weird interactions between h5py, PyTables (via Pandas), and C++ generated HDF5 files. It seems that, h5check and h5py seem to cope with type names containing '/' but pandas/PyTables cannot. Clearly, there is a gap in my…
Sardathrion - against SE abuse
  • 17,269
  • 27
  • 101
  • 156
18
votes
1 answer

Python: how to store a numpy multidimensional array in PyTables?

How can I put a numpy multidimensional array in a HDF5 file using PyTables? From what I can tell I can't put an array field in a pytables table. I also need to store some info about this array and be able to do mathematical computations on it. Any…
scripts
  • 1,452
  • 1
  • 19
  • 24
17
votes
5 answers

Python particles simulator: out-of-core processing

Problem description In writing a Monte Carlo particle simulator (brownian motion and photon emission) in python/numpy. I need to save the simulation output (>>10GB) to a file and process the data in a second step. Compatibility with both Windows and…
user2304916
  • 7,882
  • 5
  • 39
  • 53
17
votes
1 answer

How to get faster code than numpy.dot for matrix multiplication?

Here Matrix multiplication using hdf5 I use hdf5 (pytables) for big matrix multiplication, but I was suprised because using hdf5 it works even faster then using plain numpy.dot and store matrices in RAM, what is the reason of this behavior? And…
mrgloom
  • 20,061
  • 36
  • 171
  • 301
1
2 3
41 42