I'm trying to discretize the columns of a 2D array into equal-sized bucket. A simple 2D array example, which contains NaNs:
import numpy as np
import pandas as pd
np.random.seed(0)
sarray = np.random.rand(500,500)
sarray[sarray>0.9] = np.nan
I tried using the Pandas qcut function:
pd.qcut(sarray, q=10, labels=False, duplicates='drop')
But it only support 1D arrays
ValueError: Input array must be 1 dimensional
I can get the results with a list comprehension:
[pd.qcut(sarray[:,col], q=10, labels=False, duplicates='drop') for col in range(sarray.shape[1])]
But this approach is not vectorizing the calculation. Is there a way with Numpy to vectorize this discretization problem (i.e. to perform the calculation on a single 2D array instead of on multiple 1D arrays)?