0

I have three dimensional data, where one dimension is categorical: length, width, target. For simplicity, say that target can take values in {0, 1, 2}. I would like to plot length vs width "by" target. The points will have different colours and shapes depending on the target value.

I am able to do this in matplotlib.pyplot, imported as plt, using the following syntax. I assume that a pandas DataFrame df has the structure I imposed.

X0 = df.query("target == 0.0").drop("target", axis = 1)
X1 = df.query("target == 1.0").drop("target", axis = 1)
X2 = df.query("target == 2.0").drop("target", axis = 1)

ax = plt.axes()
X0.plot(x = "length", y = "width", kind = "scatter", ax = ax, color = "red")
X1.plot(x = "length", y = "width", kind = "scatter", ax = ax, color = "blue")
X2.plot(x = "length", y = "width", kind = "scatter", ax = ax, color = "green")
plt.show()

I'm sure that we can all agree that this is bbaaaddd.

A few years ago, I used to do some programming in R. The ggplot2 package allowed a syntax of the form

ggplot(df, x = length, y = width, shape = target).geom_point().

One could replace shape = target with colour = target to get different colours depending on the value of target.

I would like something similar in pyplot. Try as I might, I have not been able to find such information in documentation or online sources. I'm sure it must be out there somewhere. I just have not been able to find it...


Edit. This question was marked as a duplicate. The duplicates were helpful in solving some of the issues, but they do not resolve all the questions raised above. In particular, shapes are not discussed. The closest that I have found is the following question: How to change the shape of the marker depending on a column variable?. There are other similar questions. But this is pretty ugly compared with a simple shape = "target" call.

There is a "ggplot for python" package, called plotnine, but it doesn't seem to have been updated for 5 years. You also seem to need do stuff like from plotnine import *, which I'm certainly not excited by.

Maybe the functionality I'm after just doesn't exist in pyplot. If so, such is life! :)


Edit. @Trenton McKinney suggests using seaborn, imported as sns. This has a hue option, which does precisely the different colouring.

sns.scatterplot(data = df, x = "length", y = "width", hue = "target")

This still doesn't answer my question about shapes---neither did the (partial) "duplicates". However, sns.scatterplot also has a style option, which has the same description as hue except "different colours" is replaced by "different markers".

sns.scatterplot(data = df, x = "length", y = "width", style = "target")

Why not go crazy and use both hue and style!

I guess that the correct answer is "don't do it in matplotlib; do it in seaborn". Hopefully the incorrect marking as duplicate will be resolved, then I can add an answer with the full details.

Sam OT
  • 420
  • 1
  • 4
  • 19

1 Answers1

0

How about that:

for target in [0.0, 1.0, 2.0]:
    df.query("target == " + str(target)).drop("target", axis = 1).plot(x = 
        "length", y = "width", kind = "scatter")
plt.show()
Tommy
  • 111
  • 6