First of all I show you what I need: I need a boxplot with broken x-axis, possibily more than a single break. An example is this figure
Now: I have two list of the form X
and Y
( X = float
, Y = int
). First I group Y
in sublists according to the integer part of X
(X
and Y
are the same length):
number_of_units = int(max(X)) + 1
my_data = []
for i in range(number_of_units):
my_data.append([])
for i in range(len(X)):
j = int(X[i] )
my_data[j].append(Y[i])
In this way my_data
is a list of lists, with number_of_units
sublists. The k
-th subslist contains all the X
values that are associated to Y
values whose integer part is k
. Here the problem: most of the subslists are empty: Y
spans many orders of magnitude and typical values of number_of_units
is 10^5
, but most of the Y
have integer part in [1,10]
so that most of the sublists in my_data
are empty. The direct consequence is that if I do
fig, ax = plt.subplots()
ax.boxplot(my_data, 'options')
I obtain something like the following figure (note the "upper-right" red point):
This is due to the emptyness of most of the sublists in my_data
: most of the plot shows "zero-frequency". So what I need is to break the x-axis of the plot whenever the frequency is zero. Note that:
- The points where the ax has to be broken must be found dynamically, since they change with the data.
- There are very high chances that the ax has to be broken multiple times
Theoretical idea
Split the list
my_data
intoM
lists of lists, where the split has to be done according to the emptyness ofmy_data
: ifmy_data[k]
is the first empty sublist, thanmy_data[0],...,my_data[k-1]
is the first group; then find the first non empty sublist with index>k
and there the second group begins. When I find another empty sublists, the second group is formed and so on. I hope I was clear.Do a
ax.boxplot()
for each of the new list of lists. This time none of the sublists will be empty.Plot each
ax
as subplots and join all the subplots as suggested here.
This approach has a number of difficulties to me. The main problem is that I don't know a priori the number of subplots I will need, this number depending on the dataset and this is a problem I really don't know how to overcome. So I ask:
How can I authomatically locate the regions of the X-axis that have non-zero frequency and plot only those regions, with an underlying broken ax everytime the regions end?
Any suggestion would be appreciated.
EDIT
My question is not a duplicate of this questions because the latter does not contains any explanation on how to break the X axis. However the combination of the information in questions 1 and 2 might fully solve the problem. I'm actually working on it and I will edit the question further when the problem will be solved.