How to plot scatter graph with markers based on column value

Question

I am trying to plot a scatter graph on some data with grouping. They are grouped by the column group and I want them to have different marker styles based on the group.

Minimal working code

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

colors = ['r','g','b','y']
markers = ['o', '^', 's', 'P']

df = pd.DataFrame()
df["index"] = list(range(100))
df["data"] = np.random.randint(100, size=100)
df["group"] = np.random.randint(4, size=100)
df["color"] = df.apply(lambda x: colors[x["group"]], axis=1)
df["marker"] = df.apply(lambda x: markers[x["group"]], axis=1)

plt.scatter(x=df["index"], y=df["data"], c=df["color"])
# What I thought would have worked
# plt.scatter(x=df["index"], y=df["data"], c=df["color"], marker=df["marker"])
plt.show()

What I want

I want the groups to have different marker styles as well. For example the red entries will have marker "o" (big dot), green entries with marker "^" (upward triangle) and so on.

What I tried

I thought

plt.scatter(x=df["index"], y=df["data"], c=df["color"], marker=df["marker"])

would have worked but nope...

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I can for loop over the DataFrame and group the entries by their group. Then plot them with the marker argument set with the list defined (like plt.scatter(..., marker=markers[group]). That would result in 4 plt.scatter(...) as there are 4 groups in total. But that is ugly IMO to loop through a DataFrame row by row and I strongly believe there is a better way.

Thanks in advance!

I tried to see if various scatter plot markers could be supported in the list, but it still doesn't seem to be possible. The best way is to make it a function, as described in [this answer](https://stackoverflow.com/questions/51810492/how-can-i-add-a-list-of-marker-styles-in-matplotlib). — r-beginners, Nov 26 '21 at 09:31
As an alternative, if you use Seaborn, the following code will do the trick. `sns.scatterplot(x=df["index"], y=df["data"], c=df["color"],style=df['marker'])` — r-beginners, Nov 26 '21 at 09:35

tdy · Accepted Answer · 2021-11-26T11:59:56.317

matplotlib

that is ugly IMO to loop through a DataFrame row by row and I strongly believe there is a better way

With matplotlib, I don't think there is a better way than to loop. Note that if you groupby the markers, it does not loop row by row, just group by group (so 4 times in this case).

This will call plt.scatter 4 times (once per marker):

for marker, d in df.groupby('marker'):
    plt.scatter(x=d['index'], y=d['data'], c=d['color'], marker=marker, label=marker)
plt.legend()

seaborn

As r-beginners commented, sns.scatterplot supports multiple markers via style:

sns.scatterplot(x=df['index'], y=df['data'], c=df['color'], style=df['marker'])