How to apply Pandas qcut or similar function to a 2D array

Asked Jun 17 '21 at 23:21

Active Jun 17 '21 at 23:32

Viewed 156 times

I'm trying to discretize the columns of a 2D array into equal-sized bucket. A simple 2D array example, which contains NaNs:

import numpy as np
import pandas as pd
np.random.seed(0)
sarray = np.random.rand(500,500)
sarray[sarray>0.9] = np.nan

I tried using the Pandas qcut function:

pd.qcut(sarray, q=10, labels=False, duplicates='drop')

But it only support 1D arrays

ValueError: Input array must be 1 dimensional

I can get the results with a list comprehension:

[pd.qcut(sarray[:,col], q=10, labels=False, duplicates='drop') for col in range(sarray.shape[1])]

But this approach is not vectorizing the calculation. Is there a way with Numpy to vectorize this discretization problem (i.e. to perform the calculation on a single 2D array instead of on multiple 1D arrays)?

asked Jun 17 '21 at 23:21

m_power

3,156
5
33
54

The `qcut` operation is somewhat costly, so I doubt that you can gain much speed with vectorization. Maybe you can parallelize the calculation over columns instead. – hilberts_drinking_problem Jun 17 '21 at 23:57
Ok, but how to do the parallelization? – m_power Jun 18 '21 at 01:06
[This](https://stackoverflow.com/questions/45526700/easy-parallelization-of-numpy-apply-along-axis) thread may be helpful. – hilberts_drinking_problem Jun 18 '21 at 01:08
@hilberts_drinking_problem, Thanks, I'll take a look! – m_power Jun 18 '21 at 01:39

How to apply Pandas qcut or similar function to a 2D array

0 Answers0