Questions tagged [bcolz]

bcolz is a Python package providing high-performance numpy style access to data stored in arrays or tables backed by either memory or disk.

bcolz is a python package providing high-performance numpy style access to data stored in arrays or tables backed by either memory or disk. The efficiency relies on the data being stored in columns, and being compressed (following the fact that modern CPUs spend a lot of time waiting for data and that memory is the main bottleneck).

Documentation website: http://bcolz.blosc.org/

15 questions

votes

2 answers

Why does dask.dataframe compute() result gives IndexError in specific cases? How to find reason of async error?

When using current version of dask ('0.7.5', github: [a1]) due to large size of data, I was able to perform partitioned calculations by means of dask.dataframe api. But for a large DataFrame that was stored as record in bcolz ('0.12.1', github:…

pandas dask bcolz

asked Dec 23 '15 at 00:17

RA Prism

votes

3 answers

zipline installation error : failed building wheel for bcolz

I'm trying to install zipline on a virtual environment on mac os. Python version = 3.6 / numpy, cython pre-installed When I try pip install zipline on the virtual environment, I get the following error. (There are a lot of warnings printed out on…

python pip zipline bcolz

asked Dec 29 '18 at 11:16

HyeongGyu Froilan Choi

votes

0 answers

Deadlock issue with joblib

I would like to parallelize converting numpy array to bcolz carray with joblib, but when writing to carray, deadlock occurred. The simplified code is here. import cv2 import pandas as pd import numpy as np from tqdm import tqdm import bcolz from…

python deadlock joblib bcolz

asked Feb 11 '18 at 08:09

横尾修平

vote

0 answers

Is there a way to free memory of a data container (ctable bcolz) in Python?

I am using SLURM to run a deep learning framework. I am trying to integrate different data containers to this framework (hdf5, bcolz (ctable) and zarr). When running the framework using ctable as a data structure I got an error "slurmstepd: error:…

memory slurm bcolz

asked Aug 26 '19 at 07:31

Fatma RAHMANI

vote

0 answers

How to fix clang 'invalid deployment target' error when installing bcolz

I want to install zipline in my Mac OS X Mojave (0.14.2 (18C54)) using Python 3.5.4, but the installation fails every time the compiler tries to install bcolz through pip install. I've already tried to re-install Xcode to the latest version (10.1)…

python python-3.x zipline bcolz

asked Jan 08 '19 at 06:51

civy

vote

1 answer

Keras - with regards to performance - is bcolz better than using datagenerator?

I am struggling with the following points: When should bcolz be used instead of keras' data generator? Looks like the keras' model has apis to accept an array with batch or define the data generator as well. Is there a performance improvement when…

keras dask bcolz

asked Dec 15 '17 at 23:40

neowulf33

vote

1 answer

Writing larger than memory data into bcolz

so I got this big tick data file (one day 60GB uncompressed) that I want to put into bcolz. I planned to read this file chunk by chunk and append them into bcolz. As far as I know, bcolz only support append columns not rows. However, tick data is…

python bcolz

asked Dec 05 '16 at 06:28

qichao_he

4,204
4
15
24

vote

1 answer

saving dask dataframe in bcolz format

The dask documentation states: "BColz is an on-disk, chunked, compressed, column-store. These attributes make it very attractive for dask.dataframe which can operate particularly well on it. There is a special from_bcolz function." However, I could…

python dask bcolz

asked Jul 11 '16 at 20:19

Arco Bast

3,595
2
26
53

vote

1 answer

data size blows out when storing in bcolz

I have a dataset with ~7M rows and 3 columns, 2 numeric and 1 consisting of ~20M distinct string uuids. The data takes around 3G as a csv file and castra can store it in about 2G. I would like to test out bcolz with this data. I…

python blaze bcolz

asked Feb 26 '16 at 12:34

Daniel Mahler

7,653
5
51
90

votes

1 answer

Package fails in docker container. Reinstall and it works. Why?

I am running a docker container that works perfectly on multiple different hosts. However when I run on AWS cr1.8xlarge one of the packages (bcolz) fails with "invalid instruction" error. I exec into the container and run bcolz.test() which fails.…

python docker bcolz

asked Jan 16 '18 at 17:48

simon

2,561
16
26

votes

1 answer

Link Errors Installing Python Blosc

I installed C-Blosc library no problem, but when I try to install python-blosc and get to python setup.py build_ext --inplace --blosc="C:\\Program Files (x86)\\blosc" I get link errors. What's going on? I have Windows 7 and am trying to install…

python linker-errors bcolz

asked Dec 21 '17 at 01:00

adam.hendry

4,458
5
24
51

votes

1 answer

Loading larger than memory data into bcolz from Redshift

I would like to save the output of a redshift query locally. I have tried using blaze/odo but with default settings that tries to load all the data into memory before writing and trying to stream the data throws other errors, described another…

python sqlalchemy psycopg2 amazon-redshift bcolz

asked Feb 11 '16 at 20:41

Daniel Mahler

7,653
5
51
90

votes

1 answer

Pandas / odo / bcolz selective loading of rows from a large CSV file

Say we have large csv file (e.g. 200 GB) where only a small fraction of rows (e.g. 0.1% or less) contain data of interest. Say we define such condition as having one specific column contain a value from a pre-defined list (e.g. 10K values of…

python-3.x pandas blaze bcolz odo

asked Feb 04 '16 at 15:40

Amelio Vazquez-Reina

91,494
132
359
564

votes

1 answer

performance of appending data into a bcolz table

I'm just getting started using the bcolz package and running through the tutorial on ctables. Creating a table using the fromiter function, i.e: N = 100*1000 ct = bcolz.fromiter(((i,i*i) for i in range(N)), dtype="i4,f8", count=N, rootdir='mydir',…

python macos numpy python-3.5 bcolz

asked Oct 21 '15 at 08:56

Sam Mason

15,216
1
41
60

votes

1 answer

Convert multi-node PyTable to bcolz

I'm looking to experiment a bit with bcolz and see if it is compatible with what I need to do. I have a dataset consisting of about 11 million rows and about 120 columns. This data is currently stored in PyTables "table" format in an HDF5 file. …

python hdf5 pytables bcolz

asked Oct 16 '15 at 21:33

BrenBarn

242,874
37
412
384