0

I try to create scatter plot base dataframe with 3 columns: 'a', 'b' , 'c'.

  a  |  b  |  c
  2  | 0.8 |  k
  3  | 0.4 |  l
  4  | 0.2 |  k

I set the 'a' column to x axis and the 'b' column to y axis.

fig, ax = plt.subplots()
df = pd.read_csv(csv_file)
ax.scatter(df['a'],df['b'])
plt.show()

The 'c' column is categorical column. I try to use this column to legend that every category will be in other color.

How can I do that?

EDIT

I don't know the labels in the 'c' column and how much labels.

ron kolel
  • 39
  • 1
  • 11
  • Please include a minimal example of the DataFrame ([mre]). You want each point to have a color based on its category? – wwii Jul 05 '20 at 13:45
  • @wwii, yes I am. – ron kolel Jul 05 '20 at 13:47
  • Does this answer your question? [plot different color for different categorical levels using matplotlib](https://stackoverflow.com/questions/26139423/plot-different-color-for-different-categorical-levels-using-matplotlib) – wwii Jul 05 '20 at 13:51
  • Also - [Scatter plots in Pandas/Pyplot: How to plot by category](https://stackoverflow.com/questions/21654635/scatter-plots-in-pandas-pyplot-how-to-plot-by-category) uses `.plot` instead of `.scatter`, – wwii Jul 05 '20 at 14:00

2 Answers2

1

if you are open to other package, try seaborn:

import seaborn as sns
sns.scatterplot(data=df, x='a',y='b', hue='c')

Output:

enter image description here

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

You can use a parameter c in scatter, like this:

ax.scatter(df['a'],df['b'],c=df['c'])

Here is the documentation for scatter:

According to this answer to another question How to convert categorical data to numerical data?, you can use pd.factorize to create a column of int for each of your categories like so: df['new_column'] = pd.factorize(df['some_column'])[0]

Bagutreko
  • 1
  • 2
  • I tried but I get an error: ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers – ron kolel Jul 05 '20 at 14:00