Scatter plot of aggregates of two columns

Question

I have:

df = pd.DataFrame({"State": ["CA", "NY", "CA", "NY", "CA", "NY", "TX", "TX", "TX"],
                   "Company": ["A", "A", "A", "B", "C", "D", "A", "B", "B"],
                   "Profits": [3, 2, 5, 6, 7, 2, 2, 4, 7]})

State   Company Profits
0   CA     A    3
1   NY     A    2
2   CA     A    5
3   NY     B    6
4   CA     C    7
5   NY     D    2
6   TX     A    2
7   TX     B    4
8   TX     B    7

I would like to create a scatter plot with each point corresponding to a state. On the x-axis, I want the number of unique companies in that state (e.g. CA has 2 companies A and C). On the y-axis, I want the average profits of all companies in the state (e.g. California's average profit is 5).

I try:

n_companies = df.groupby("State")["Company"].nunique()
mean_profits = df.groupby("State")["Profits"].mean()
import matplotlib.pyplot as plt
plt.scatter(n_companies, mean_profits, label)
plt.show()

which appears to work but how do I get the label of each point to be its state?

I added what I tried. I'm just not sure how to get the labels now. — Smithey, Jul 23 '22 at 03:27
Does this answer your question? [Scatter plot with different text at each data point](https://stackoverflow.com/questions/14432557/scatter-plot-with-different-text-at-each-data-point) — Tom, Jul 23 '22 at 03:52

Yusuf Syam · Accepted Answer · 2022-07-25T16:34:07.240

1

for i in range(len(n_companies)):
    plt.scatter(n_companies[i], mean_profits[i], label= mean_profits.index[i])
plt.legend()
plt.show()

edited Jul 25 '22 at 16:34

answered Jul 24 '22 at 13:07

Yusuf Syam

701
1
4
18

Thanks but that's a little different than what I had in mind. That plots company name against mean_profits. I wanted to plot n_companies against mean_profits but label with the company name. I actually still need to read the suggested answer in another comment, so I may find it there in time. – Smithey Jul 25 '22 at 16:19
Oh sorry for the misunderstood – Yusuf Syam Jul 25 '22 at 16:21
@Smithey i have added the new answer is that what you want to do? – Yusuf Syam Jul 25 '22 at 16:33
Ideally, I was hoping to get the labels next to the points rather than in a legend with colors, but this works ok. Thanks a lot! – Smithey Jul 31 '22 at 19:08

Scatter plot of aggregates of two columns

1 Answers1