In my ML app, I use an output 1D np.array Y to color code a scatterplot dots. I need to bring a variety of widely distributed integer values to sequential integers to utilize better distribution of colors in the colormap.
What I did is this:
def normalize(Y):
U = np.unique(Y)
for i in range(U.size):
Y[Y==U[i]] = i
return Y
Which replaces them with indices in array's unique'd form.
I wonder if there is a way to do this more efficiently with numpy. There's got to be a powerful one-liner somewhere out there
*Another thing I could not figure out how to do is to have the sequential values sorted accordingly to the number of corresponding occurences in Y, so that distribution of clustering was obvious on the plot.