I would like to limit the y-axis boundaries based on the general range of my data, avoiding spikes but not removing them.
I am producing many sets of graphs comparing two sets of data. Both sets contain data over a year and have been read into dataframes with pandas and the graphs are produced via loop for each month. One of the sets has interment spikes which causes the range on the y-axis to be plotted much too large, resulting in an unreadable chart. Setting a fixed boundary with pyplot.ylim()
doesn't help as the general range of the data (for example within one month) changes from chart to chart and applying a hard limit reduces the readability of many of the charts.
For example: one month may have data which generally does not go higher than a value of 300,000 but has several spikes which go way over 500,000 (and below -500,000), but another month may also have large spikes but data which does otherwise not go higher than a value of 150,000.
I've also tried setting values which are too large to nan
using df2 = df[df.y < 500000] = np.nan
based on this answer but the breaks in the line graph are too small to see and the fact that the spikes occur gets lost.
Is there some way to figure out what the general maximum and minimum range of the data is so that the y-axis limits can be set in a sensible way?