Castra is a Python library that allows efficient data column-store.
Questions tagged [castra]
5 questions
8
votes
1 answer
dask computation not executing in parallel
I have a directory of json files that I am trying to convert to a dask DataFrame and save it to castra.
There are 200 files containing O(10**7) json records between them.
The code is very simple largely following tutorial examples.
import…

Daniel Mahler
- 7,653
- 5
- 51
- 90
5
votes
2 answers
Dask DataFrame: Resample over groupby object with multiple rows
I have the following dask dataframe created from Castra:
import dask.dataframe as dd
df = dd.from_castra('data.castra', columns=['user_id','ts','text'])
Yielding:
user_id / ts / text
ts
2015-08-08 01:10:00 …

zanbri
- 5,958
- 2
- 31
- 41
0
votes
1 answer
ImportError: cannot import name 'msgpack'
I'm following a tutorial that uses castra and dask to read in reddit comments.
I have installed the latest versions of dask and pandas using anaconda and castra using pip. My pandas version is '0.22.0', and I have installed msgpack using pip…

Parseltongue
- 11,157
- 30
- 95
- 160
0
votes
1 answer
How to pass an array into Hoplon from a Castra backend
If I am trying to an array into the index.cljs.hl page how do I go about using the array in Clojurescript. I found that I can use:
(loop-tpl :bindings [single-data rpc/test-vector]
(h2 single-data))
In the hLisp part but if I want to use the…

phlie
- 1,335
- 3
- 10
- 19
0
votes
2 answers
Not able to load castra files with from_castra() function of dask
I am trying to replicate the example of this page about castra, dask and reddit comments, and I get the above error when I run the
dd.from_castra(data,columns)
My castra file took some hours to be created but it is clean and exactly as the…

oikonang
- 51
- 11