3

I am trying to plot a scatter graph on some data with grouping. They are grouped by the column group and I want them to have different marker styles based on the group.

Minimal working code

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

colors = ['r','g','b','y']
markers = ['o', '^', 's', 'P']

df = pd.DataFrame()
df["index"] = list(range(100))
df["data"] = np.random.randint(100, size=100)
df["group"] = np.random.randint(4, size=100)
df["color"] = df.apply(lambda x: colors[x["group"]], axis=1)
df["marker"] = df.apply(lambda x: markers[x["group"]], axis=1)

plt.scatter(x=df["index"], y=df["data"], c=df["color"])
# What I thought would have worked
# plt.scatter(x=df["index"], y=df["data"], c=df["color"], marker=df["marker"])
plt.show()

example_output

What I want

I want the groups to have different marker styles as well. For example the red entries will have marker "o" (big dot), green entries with marker "^" (upward triangle) and so on.

What I tried

I thought

plt.scatter(x=df["index"], y=df["data"], c=df["color"], marker=df["marker"])

would have worked but nope...

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I can for loop over the DataFrame and group the entries by their group. Then plot them with the marker argument set with the list defined (like plt.scatter(..., marker=markers[group]). That would result in 4 plt.scatter(...) as there are 4 groups in total. But that is ugly IMO to loop through a DataFrame row by row and I strongly believe there is a better way.

Thanks in advance!

tdy
  • 36,675
  • 19
  • 86
  • 83
Henry Fung
  • 380
  • 3
  • 12
  • 1
    I tried to see if various scatter plot markers could be supported in the list, but it still doesn't seem to be possible. The best way is to make it a function, as described in [this answer](https://stackoverflow.com/questions/51810492/how-can-i-add-a-list-of-marker-styles-in-matplotlib). – r-beginners Nov 26 '21 at 09:31
  • 2
    As an alternative, if you use Seaborn, the following code will do the trick. `sns.scatterplot(x=df["index"], y=df["data"], c=df["color"],style=df['marker'])` – r-beginners Nov 26 '21 at 09:35
  • @r-beginners Lemme take a look at Seaborn :D – Henry Fung Nov 26 '21 at 10:03

1 Answers1

5

matplotlib

that is ugly IMO to loop through a DataFrame row by row and I strongly believe there is a better way

With matplotlib, I don't think there is a better way than to loop. Note that if you groupby the markers, it does not loop row by row, just group by group (so 4 times in this case).

This will call plt.scatter 4 times (once per marker):

for marker, d in df.groupby('marker'):
    plt.scatter(x=d['index'], y=d['data'], c=d['color'], marker=marker, label=marker)
plt.legend()


seaborn

As r-beginners commented, sns.scatterplot supports multiple markers via style:

sns.scatterplot(x=df['index'], y=df['data'], c=df['color'], style=df['marker'])

tdy
  • 36,675
  • 19
  • 86
  • 83