I have three dimensional data, where one dimension is categorical: length
, width
, target
. For simplicity, say that target
can take values in {0, 1, 2}
. I would like to plot length
vs width
"by" target
. The points will have different colours and shapes depending on the target value.
I am able to do this in matplotlib.pyplot
, imported as plt
, using the following syntax. I assume that a pandas
DataFrame
df
has the structure I imposed.
X0 = df.query("target == 0.0").drop("target", axis = 1)
X1 = df.query("target == 1.0").drop("target", axis = 1)
X2 = df.query("target == 2.0").drop("target", axis = 1)
ax = plt.axes()
X0.plot(x = "length", y = "width", kind = "scatter", ax = ax, color = "red")
X1.plot(x = "length", y = "width", kind = "scatter", ax = ax, color = "blue")
X2.plot(x = "length", y = "width", kind = "scatter", ax = ax, color = "green")
plt.show()
I'm sure that we can all agree that this is bbaaaddd.
A few years ago, I used to do some programming in R
. The ggplot2
package allowed a syntax of the form
ggplot(df, x = length, y = width, shape = target).geom_point().
One could replace shape = target
with colour = target
to get different colours depending on the value of target
.
I would like something similar in pyplot
. Try as I might, I have not been able to find such information in documentation or online sources. I'm sure it must be out there somewhere. I just have not been able to find it...
Edit.
This question was marked as a duplicate. The duplicates were helpful in solving some of the issues, but they do not resolve all the questions raised above. In particular, shapes are not discussed. The closest that I have found is the following question: How to change the shape of the marker depending on a column variable?. There are other similar questions. But this is pretty ugly compared with a simple shape = "target"
call.
There is a "ggplot for python" package, called plotnine
, but it doesn't seem to have been updated for 5 years. You also seem to need do stuff like from plotnine import *
, which I'm certainly not excited by.
Maybe the functionality I'm after just doesn't exist in pyplot
. If so, such is life! :)
Edit. @Trenton McKinney suggests using seaborn
, imported as sns
. This has a hue
option, which does precisely the different colouring.
sns.scatterplot(data = df, x = "length", y = "width", hue = "target")
This still doesn't answer my question about shapes---neither did the (partial) "duplicates". However, sns.scatterplot
also has a style
option, which has the same description as hue
except "different colours" is replaced by "different markers".
sns.scatterplot(data = df, x = "length", y = "width", style = "target")
Why not go crazy and use both hue
and style
!
I guess that the correct answer is "don't do it in matplotlib
; do it in seaborn
". Hopefully the incorrect marking as duplicate will be resolved, then I can add an answer with the full details.