0

I am trying to plot a matplotlib plot with colors according to names in a pandas DataFrame. Whereby in the x, y plot different name points have a different color.

dataframe:

    id  x   y   Names   
0   MAC004524   29.137983   11.864633   ACORN-M     
1   MAC004525   28.14       11.80       ACORN-M 
2   MAC004526   24.14       12.80       ACORN-C 
....

code:

names = set(df['Names'])
colors = list(cmap(np.linspace(0, 1, len(names))))
df['color']=0
for a, c in zip(names, colors):
    mask = df.loc[df['Names'] == a]
    df.loc[mask, 'color'] = c
#but get an error here KeyError: "[('i', 'd') ('x',) ('y',) ('A', 'c', 'o', 'r', 'n')\n ('A', 'c', 'o', 'r', 'n', '_', 'g', 'r', 'o', 'u', 'p', 'e', 'd')\n ('c', 'o', 'l', 'o', 'r')] not in index"

then id like to plot

x = df['x']
y = df['y']
c= df['color']
plt.scatter(x, y, c=c, s=1)

required df:

    id  x   y   Names    color
0   MAC004524   29.137983   11.864633   ACORN-M    [0.267004 0.004874 0.329415 1.      ] 
proximacentauri
  • 1,749
  • 5
  • 25
  • 53

2 Answers2

1

The problem should be on trying to place a list on a cell. As shown here, you should use .at instead of .loc.

Probably not the most efficient way, but it gets the job done:

for a, c in zip(names, colors):
    mask = df[df['Names'] == a].index
    for value in mask:
        df.at[value, 'color'] = c

I added the index, and iterate over each of value of it, to replace by the designated color, as I am yet to find a way to add more than one value to the row index of .at command.

Márcio Coelho
  • 333
  • 3
  • 11
1

Have you looked into seaborn plotting? You could make the plot from the original dataframe immediately:

import seaborn as sns
sns.scatterplot(x='x', y='y', hue='Names', data=df)
Jondiedoop
  • 3,303
  • 9
  • 24