I have a large array of data, for which I need to get the histogram without the bin with the highest frequency. I use this, to remove such bin, but then I need to save the changed histogram, since I have to compare it with another histogram. I don't know how to do this, since the initial data has not been changed, and I can only see the change in the presentation. I was thinking of somehow manipulating the initial data to reflect such a change in the histogram (like removing those data that appear in the bin with the highest frequency), but what I have tried so far doesn't work. This is a sample code, mainly based on the above link, with a few changes to work for my purpose, which unfortunately doesn't do the job:
import numpy as np
import matplotlib.pyplot as plt
gaussian_numbers = np.random.randn(100)
# Get histogram
values, bin_edges = np.histogram(gaussian_numbers, bins=6)
centers = (bin_edges[:-1] + bin_edges[1:]) / 2
width = (bin_edges[1] - bin_edges[0])
plt.bar(centers, values, color="blue",align='center',width=width)
plt.show()
values[np.where(values == np.max(values))] = 0
binCenters =(bin_edges[:-1] + bin_edges[1:]) / 2
plt.bar(binCenters, values, color="blue",align='center', width=width)
plt.show()
new=gaussian_numbers[(gaussian_numbers!= np.max(values))]
print np.sum(new-gaussian_numbers)
I can see the bin with the highest frequency has been removed when I draw the bar graph. But, when I try to remove such values from my data and save it in an array called new (then I want to save the histogram of new) there is no difference between new and gaussian_numbers. This means their histograms are the same as well. Is there any way to remove such data?