How to avoid overlapping labels in a Matplotlib scatter plot? (in an automated way)

Question

I have to plot several correlation plots. In each case I need to label each point (at the moment 8, but this number can easily increase to several dozen). Since sometimes the points are very close to each other I have a problem with overlapping labels. Unfortunately, these points are distributed differently in each plot, therefore I cannot fix the labels for each plot, because I would need to this for all of the plots differently. Is there an automated way to avoid overlapping, e.g. by pushing the label a bit higher every time it overlaps with another one?

In addition to extensive research on Google and StackOverlfow, where I couldn't find any proper way to automate this problem, I tried some approaches. My most successful one was to overlay a grid in form of a numpy array and then fill in positions with a 1, if the place was occupied. This works somewhat fine, however, there is a problem. To correctly allocate the points to a position in the grid I'm rounding down the values in x and y direction. Then I try to find the closest zero in the grid to the actual position of the point. After that I calculate the offset for this label. The problem is now, that the offset is from the rounded position and not the actual one. This leads to overlapping of points and labels. Moreover, if I try to calculate the offset from the original position, the whole concept of the grid doesn't work anymore and the labels overlap again.

def plotlinreg(xfile: pd.DataFrame, yfile: pd.DataFrame, xcolumn: str, ycolumn: str, savestring: str, label_column: str):
    xfile_copy = xfile.copy()
    yfile_copy = yfile.copy()

    x = xfile[xcolumn].values
    y = yfile[ycolumn].values

    b, m = polyfit(x, y, 1)

    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(x, y, '.')
    ax.plot(x, b + m * x, '-')

######################################################################################
######### The following commands are used to make sure that no labels overlap ########
######################################################################################

    labels = list(xfile[label_column])
    no_x = 6 # Number of labels in x direction that fit next to each other
    no_y = 8 # Same for y
    x_steps = (ax.get_xlim()[1] - ax.get_xlim()[0])/no_x # Calculates the step size according to the limits and the maximal number of labels
    y_steps = (ax.get_ylim()[1] - ax.get_ylim()[0])/no_y # Same for y

    label_grid = np.zeros((no_y,no_x))

    xfile_copy[xcolumn] = xfile_copy[xcolumn].apply(lambda x: int(math.floor((x - ax.get_xlim()[0])/x_steps)))
    yfile_copy[ycolumn] = yfile_copy[ycolumn].apply(lambda y: int(math.floor((y - ax.get_ylim()[0])/y_steps)))
    # This calculates the positions of the values by substracting the minimum value, dividing it by the step size and then rounding down.

    for x_position, y_position in zip(xfile_copy[xcolumn], yfile_copy[ycolumn]):
        # Blocks position in the grid where the data points are to avoid an overlap between those and the labels
        label_grid[no_y-1 - y_position, x_position] = 1

    for label, x, y, x_position, y_position in zip(labels, xfile[xcolumn], yfile[ycolumn], xfile_copy[xcolumn], yfile_copy[ycolumn]):
        delta = 0
        mdelta = 1
        condition = True
        positive = True

        while condition:
            while positive:
            # First, try to find a new position by looking for an empty space in a column to the right
                if x_position+delta == no_x-1:
                    break
                if 0 in label_grid[:, x_position+delta]: # Is there an empty position in the current column?
                    itemindex = np.where(label_grid[:, x_position+delta]==0) # Where are the zeros?
                    x_index = find_nearest(itemindex[0], y_position) # What is the closest one?
                    offset = (ax.get_xlim()[0] + (x_position+delta)*x_steps, ax.get_ylim()[0] + (x_index)*y_steps)
                    # Setting the offset for this label
                    label_grid[x_index, x_position + delta] = 1 # Set this position to 1
                    positive = False
                delta =+1
            if 0 in label_grid[:, x_position-delta]:
            # Same thing, but now going to the left
                itemindex = np.where(label_grid[:, x_position-delta]==0)
                x_index = find_nearest(itemindex[0], y_position)
                offset = (ax.get_xlim()[0] + (x_position-delta)*x_steps, ax.get_ylim()[0] + (x_index)*y_steps)
                label_grid[x_index, x_position - delta] = 1
                condition = False
            mdelta +=1

        ax.annotate(
            label,
            xy=(x, y), xytext=offset, fontsize=9,
            ha='left', va='top',
            bbox=dict(boxstyle='round,pad=0.2', fc='red', alpha=1),
            arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0', alpha=0.2))

################################
######### End of labels ########
################################

    ax.set_ylim(ax.get_ylim()[0] - y_steps, ax.get_ylim()[1] + 0.5*y_steps)
    ax.set_xlim(ax.get_xlim()[0] - 1.5*x_steps, ax.get_xlim()[1] + 0.3*x_steps)
    ax.set_xlabel(xcolumn)
    ax.set_ylabel(ycolumn)
    ax.set_title("Testrun")

    return fig

As can be seen here, the label for value 3 overlaps with the point

There is a more or less canonical question at [Matplotlib overlapping annotations / text](https://stackoverflow.com/questions/19073683/matplotlib-overlapping-annotations-text). Especially, there is a nice little tool, https://github.com/Phlya/adjustText. Did you try that? Or does this not work for you? In that case what needs to be different? — ImportanceOfBeingErnest, Apr 16 '19 at 15:00
That is amazing! Thank you so much! It is exactly what I need. Thank you! — MarcMarc2, Apr 16 '19 at 17:14

How to avoid overlapping labels in a Matplotlib scatter plot? (in an automated way)

0 Answers0