How can I auto-adjust my scatterplot labels without them being overlapped by other labels in python?

Question

So I have been working on this for a bit, and just wanted to see if someone could look at why I could to auto-adjust my scatter-plot labels. As I was searching for a solution I came across the adjustText library found here https://github.com/Phlya/adjustText and it seems like it should work, but I'm just trying to find an example that plots from a dataframe. As I tried replicating the adjustText examples it throws me an error So this is my current code.

  df["category"] = df["category"].astype(int)
  df2 = df.sort_values(by=['count'], ascending=False).head()
  ax = df.plot.scatter(x="category", y="count")
  a = df2['category']
  b = df2['count']
  texts = []
 for xy in zip(a, b):
        texts.append(plt.text(xy))
    adjust_text(texts, arrowprops=dict(arrowstyle="->", color='r', lw=0.5))

plt.title("Count of {column} in {table}".format(**sql_dict))

But then I got this TypeError: TypeError: text() missing 2 required positional arguments: 'y' and 's' This is what I tried to transform it from to pivot the coordinates, it works but coordinates just overlap.

    df["category"] = df["category"].astype(int)
    df2 = df.sort_values(by=['count'], ascending=False).head()
    ax = df.plot.scatter(x="category", y="count")
    a = df2['category']
    b = df2['count']
    for xy in zip(a, b):
        ax.annotate('(%s, %s)' % xy, xy=xy)

As you can see here I'm getting my df constructed from tables in sql and I'll provide you what this specific table should look like here. In this specific table it's length of stay in days compared to how many people stayed that long. So as a sample of the data may look like. I made a second datframe above so I would label only the highest values on the plot. This is one of my first experiences with graphing visualizations in python so any help would be appreciated.

[![picture of a graph of overlapping items][1]][1]

[los_days count] 3 350 1 4000 15 34

and so forth. Thanks so much. Let me know if you need anything else.

Here is an example of the df

       category  count
0          2  29603
1          4  33980
2          9  21387
3         11  17661
4         18  10618
5         20   8395
6         27   5293
7         29   4121

This is a hard problem. You're going to have to write a constraint-based linear equation solver to dynamically determine the positions of the labels — Paul H, Feb 21 '19 at 20:40

score 0 · Accepted Answer · answered Feb 21 '19 at 22:36

After some reverse engineering with an example from adjustText library and my own example, I just had to change my for loop to create the labels and it worked fantastically.

    labels = ['{}'.format(i) for i in zip(a, b)]
    texts = []
    for x, y, text in zip(a, b, labels):
        texts.append(ax.text(x, y, text))
    adjust_text(texts, force_text=0.05, arrowprops=dict(arrowstyle="-|>",
                                                        color='r', alpha=0.5))

How can I auto-adjust my scatterplot labels without them being overlapped by other labels in python?

1 Answers1