Current plot and anticipated plot
Im new to python. I'm trying to get a subset of the housing index dataset from https://github.com/ageron/handson-ml/blob/master/02_end_to_end_machine_learning_project.ipynb
I have imported the dataset as 'housing'. I am trying to plot just the outliers in quantile 0.95 on top of the plot which shows all the values for median_house_value
import matplotlib.image as mpimg
housing.plot(kind="scatter", x="median_income", y="median_house_value",
alpha=0.1)
this gets a plot of all the rows (i), i am trying to select the corresponding median_income rows for the subset of median_house_value that is the 0.95 quantile and plot them over the top in orange (j)
Below is my best attempt so far, which is not getting the correct values
plt.plot(housing.groupby('median_house_value').quantile(q=quant)["median_income"], housing.groupby('median_house_value').quantile(q=quant).index.get_level_values('median_house_value'),"or")
I can get the median_house_value rows in the quantile by doing..
quantile = int(round(housing["median_house_value"].quantile(q=0.95)))
housing.median_house_value > quantile
I want to end up with two panda arrays, one for the x axis, an array of median_income rows that correspond to the second array which would be an array of median_house_value rows that make up the quantile
Thanks in advance.