Highest Voted 'blaze' Questions

164

votes

8 answers

How to read a Parquet file into Pandas DataFrame?

How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a moderate amount of data that I would like to read in-memory with a simple…

asked Nov 19 '15 at 20:30

Daniel Mahler

7,653
5
51
90

17

votes

5 answers

Python particles simulator: out-of-core processing

Problem description In writing a Monte Carlo particle simulator (brownian motion and photon emission) in python/numpy. I need to save the simulation output (>>10GB) to a file and process the data in a second step. Compatibility with both Windows and…

numpy pandas pytables h5py blaze

asked Jan 05 '14 at 23:55

user2304916

7,882
5
39
53

10

votes

4 answers

Blaze with Scikit Learn K-Means

I am trying to fit Blaze data object to scikit kmeans function. from blaze import * from sklearn.cluster import KMeans data_numeric = Data('data.csv') data_cluster = KMeans(n_clusters=5) data_cluster.fit(data_numeric) Data Sample: A B C 1 32…

python scikit-learn blaze

asked Sep 29 '16 at 08:54

sachin saxena

926
5
18

8

votes

1 answer

pydata blaze: does it allow parallel processing or not?

I am looking to parallelise numpy or pandas operations. For this I have been looking into pydata's blaze. My understanding was that seemless parallelisation was its major selling point. Unfortunately I have been unable to find an operation that runs…

python numpy pandas multiprocessing blaze

asked Dec 16 '14 at 13:27

ARF

7,420
8
45
72

8

votes

0 answers

What are the most robust and interactive-friendly ways to structure general 2D/3D/ND datasets in Python?

I am a scientist recently converted from MATLAB to Python. I am looking for ways to structure my (mainly 2D and 3D) datasets. I have searched the net quite a bit, and it seems to me that robust and general-purpose data structuring in Python is still…

python data-structures numpy dataset blaze

asked Nov 21 '13 at 12:13

cmeeren

3,890
2
20
50

7

votes

2 answers

Where is the pydata BLAZE project heading?

I find the blaze ecosystem* amazing because it covers most of the data engineering use cases. There was definitely a lot of interest on these projects during the period 2015-2016, but of late it has been ignored. I say this looking at the commits on…

dask blaze odo datashape

asked Dec 06 '18 at 03:12

human

2,250
20
24

7

votes

0 answers

Streaming results with Blaze and SqlAlchemy

I am trying to use Blaze/Odo to read a large (~70M rows) result set from Redshift. By default SqlAlchemy witll try to read the whole result into memory, before starting to process it. This can be prevented by either…

python sqlalchemy psycopg2 amazon-redshift blaze

asked Feb 10 '16 at 21:35

Daniel Mahler

7,653
5
51
90

7

votes

1 answer

Choosing a framework for larger than memory data analysis with python

I'm solving a problem with a dataset that is larger than memory. The original dataset is a .csv file. One of the columns is for track IDs from the musicbrainz service. What I already did I read the .csv file with dask and converted it to castra…

python hdf5 blaze dask

asked Oct 14 '15 at 15:42

Nagasaki45

2,634
1
22
27

6

votes

3 answers

calling SQL functions from Blaze

In particular I would like to call the Postgres levenshtein function. I would like to write the blaze query to return words similar to the word 'similar', ie the equivalent of: select word from wordtable where levenshtein(word, 'similar') < 3; In…

python sql postgresql sqlalchemy blaze

asked Nov 18 '16 at 07:04

Daniel Mahler

7,653
5
51
90

5

votes

4 answers

Delete column(s) from very large CSV file using pandas or blaze

I have a very large csv file (5 GB), so I do not want to load the whole thing into memory, and I want to delete one or more of its columns. I tried using the following code in blaze, but all it did was append the resulting columns to the existing…

python csv pandas blaze

asked Jul 01 '16 at 15:40

Alex

3,946
11
38
66

5

votes

1 answer

How to provide user defined function for python blaze with sqlite backend?

I connect to sqlite database in Blaze using df = bz.Data("sqlite:///) everything works fine but I do not know how to provide user-defined functions in my interaction with df. I have a column called IP in df which is text containing IP…

python sqlite blaze

asked Oct 31 '15 at 05:00

Kshadi

51
2

5

votes

1 answer

Using odo to migrate data to SQL

I have a large 3 GB CSV file, and I'd like to use Blaze to investigate the data, select down to the data I'm interesting in analyzing, with the eventual goal to migrate that data into a suitable computational backend such as SQlite, PostgresSQL etc.…

python sql sqlite blaze

asked Oct 24 '15 at 07:07

Joseph

351
1
6
17

5

votes

1 answer

What are "synthetic dimensions" in Blaze?

The Blaze readme (here https://github.com/ContinuumIO/blaze) describes a number of improvements over NumPy including "Synthetic Dimensions". I have searched around but have been unable to find out what they are. Could someone enlighten me? Thanks.

python numpy blaze

asked Jan 02 '13 at 13:22

Oliver Palmer

53
5

4

votes

1 answer

access data in sharded JSON files on S3 from Blaze

I am trying to access line delimited JSON data on S3. From my understanding of the docs I should be able to do something like print data(S3(Chunks(JSONLines))('s3://KEY:SECRET@bucket/dir/part-*.json').peek() which throws BotoClientError:…

python json amazon-s3 blaze odo

asked Mar 05 '17 at 01:11

Daniel Mahler

7,653
5
51
90

4

votes

2 answers

index milion row square matrix for fast access

I have some very large matrices (let say of the order of the million rows), that I can not keep in memory, and I would need to access to subsample of this matrix in descent time (less than a minute...). I started looking at hdf5 and blaze in…

python numpy matrix blaze

asked Feb 22 '16 at 13:11

fransua

1,559
13
30

Questions tagged [blaze]