Questions tagged [bcolz]

bcolz is a Python package providing high-performance numpy style access to data stored in arrays or tables backed by either memory or disk.

bcolz is a package providing high-performance style access to data stored in arrays or tables backed by either memory or disk. The efficiency relies on the data being stored in columns, and being compressed (following the fact that modern CPUs spend a lot of time waiting for data and that memory is the main bottleneck).

Documentation website: http://bcolz.blosc.org/

15 questions
4
votes
2 answers

Why does dask.dataframe compute() result gives IndexError in specific cases? How to find reason of async error?

When using current version of dask ('0.7.5', github: [a1]) due to large size of data, I was able to perform partitioned calculations by means of dask.dataframe api. But for a large DataFrame that was stored as record in bcolz ('0.12.1', github:…
RA Prism
  • 59
  • 6
2
votes
3 answers

zipline installation error : failed building wheel for bcolz

I'm trying to install zipline on a virtual environment on mac os. Python version = 3.6 / numpy, cython pre-installed When I try pip install zipline on the virtual environment, I get the following error. (There are a lot of warnings printed out on…
2
votes
0 answers

Deadlock issue with joblib

I would like to parallelize converting numpy array to bcolz carray with joblib, but when writing to carray, deadlock occurred. The simplified code is here. import cv2 import pandas as pd import numpy as np from tqdm import tqdm import bcolz from…
横尾修平
  • 371
  • 1
  • 4
  • 9
1
vote
0 answers

Is there a way to free memory of a data container (ctable bcolz) in Python?

I am using SLURM to run a deep learning framework. I am trying to integrate different data containers to this framework (hdf5, bcolz (ctable) and zarr). When running the framework using ctable as a data structure I got an error "slurmstepd: error:…
1
vote
0 answers

How to fix clang 'invalid deployment target' error when installing bcolz

I want to install zipline in my Mac OS X Mojave (0.14.2 (18C54)) using Python 3.5.4, but the installation fails every time the compiler tries to install bcolz through pip install. I've already tried to re-install Xcode to the latest version (10.1)…
civy
  • 393
  • 2
  • 17
1
vote
1 answer

Keras - with regards to performance - is bcolz better than using datagenerator?

I am struggling with the following points: When should bcolz be used instead of keras' data generator? Looks like the keras' model has apis to accept an array with batch or define the data generator as well. Is there a performance improvement when…
neowulf33
  • 635
  • 2
  • 7
  • 19
1
vote
1 answer

Writing larger than memory data into bcolz

so I got this big tick data file (one day 60GB uncompressed) that I want to put into bcolz. I planned to read this file chunk by chunk and append them into bcolz. As far as I know, bcolz only support append columns not rows. However, tick data is…
qichao_he
  • 4,204
  • 4
  • 15
  • 24
1
vote
1 answer

saving dask dataframe in bcolz format

The dask documentation states: "BColz is an on-disk, chunked, compressed, column-store. These attributes make it very attractive for dask.dataframe which can operate particularly well on it. There is a special from_bcolz function." However, I could…
Arco Bast
  • 3,595
  • 2
  • 26
  • 53
1
vote
1 answer

data size blows out when storing in bcolz

I have a dataset with ~7M rows and 3 columns, 2 numeric and 1 consisting of ~20M distinct string uuids. The data takes around 3G as a csv file and castra can store it in about 2G. I would like to test out bcolz with this data. I…
Daniel Mahler
  • 7,653
  • 5
  • 51
  • 90
0
votes
1 answer

Package fails in docker container. Reinstall and it works. Why?

I am running a docker container that works perfectly on multiple different hosts. However when I run on AWS cr1.8xlarge one of the packages (bcolz) fails with "invalid instruction" error. I exec into the container and run bcolz.test() which fails.…
simon
  • 2,561
  • 16
  • 26
0
votes
1 answer

Link Errors Installing Python Blosc

I installed C-Blosc library no problem, but when I try to install python-blosc and get to python setup.py build_ext --inplace --blosc="C:\\Program Files (x86)\\blosc" I get link errors. What's going on? I have Windows 7 and am trying to install…
adam.hendry
  • 4,458
  • 5
  • 24
  • 51
0
votes
1 answer

Loading larger than memory data into bcolz from Redshift

I would like to save the output of a redshift query locally. I have tried using blaze/odo but with default settings that tries to load all the data into memory before writing and trying to stream the data throws other errors, described another…
Daniel Mahler
  • 7,653
  • 5
  • 51
  • 90
0
votes
1 answer

Pandas / odo / bcolz selective loading of rows from a large CSV file

Say we have large csv file (e.g. 200 GB) where only a small fraction of rows (e.g. 0.1% or less) contain data of interest. Say we define such condition as having one specific column contain a value from a pre-defined list (e.g. 10K values of…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
0
votes
1 answer

performance of appending data into a bcolz table

I'm just getting started using the bcolz package and running through the tutorial on ctables. Creating a table using the fromiter function, i.e: N = 100*1000 ct = bcolz.fromiter(((i,i*i) for i in range(N)), dtype="i4,f8", count=N, rootdir='mydir',…
Sam Mason
  • 15,216
  • 1
  • 41
  • 60
0
votes
1 answer

Convert multi-node PyTable to bcolz

I'm looking to experiment a bit with bcolz and see if it is compatible with what I need to do. I have a dataset consisting of about 11 million rows and about 120 columns. This data is currently stored in PyTables "table" format in an HDF5 file. …
BrenBarn
  • 242,874
  • 37
  • 412
  • 384