Questions tagged [castra]

Castra is a Python library that allows efficient data column-store.

5 questions
8
votes
1 answer

dask computation not executing in parallel

I have a directory of json files that I am trying to convert to a dask DataFrame and save it to castra. There are 200 files containing O(10**7) json records between them. The code is very simple largely following tutorial examples. import…
Daniel Mahler
  • 7,653
  • 5
  • 51
  • 90
5
votes
2 answers

Dask DataFrame: Resample over groupby object with multiple rows

I have the following dask dataframe created from Castra: import dask.dataframe as dd df = dd.from_castra('data.castra', columns=['user_id','ts','text']) Yielding: user_id / ts / text ts 2015-08-08 01:10:00 …
zanbri
  • 5,958
  • 2
  • 31
  • 41
0
votes
1 answer

ImportError: cannot import name 'msgpack'

I'm following a tutorial that uses castra and dask to read in reddit comments. I have installed the latest versions of dask and pandas using anaconda and castra using pip. My pandas version is '0.22.0', and I have installed msgpack using pip…
Parseltongue
  • 11,157
  • 30
  • 95
  • 160
0
votes
1 answer

How to pass an array into Hoplon from a Castra backend

If I am trying to an array into the index.cljs.hl page how do I go about using the array in Clojurescript. I found that I can use: (loop-tpl :bindings [single-data rpc/test-vector] (h2 single-data)) In the hLisp part but if I want to use the…
phlie
  • 1,335
  • 3
  • 10
  • 19
0
votes
2 answers

Not able to load castra files with from_castra() function of dask

I am trying to replicate the example of this page about castra, dask and reddit comments, and I get the above error when I run the dd.from_castra(data,columns) My castra file took some hours to be created but it is clean and exactly as the…
oikonang
  • 51
  • 11