3D PCA in matplotlib: how to add legend?

Question

I am attempting to use http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html for my own data to construct a 3D PCA plot. The tutorial, however, did not specify how I can add a legend. Another page, https://matplotlib.org/users/legend_guide.html did, but I cannot see how I can apply the information in the second tutorial to the first.

How can I modify the code below to add a legend?

# Code source: Gae"l Varoquaux
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets

np.random.seed(5)

centers = [[1, 1], [-1, -1], [1, -1]]
iris = datasets.load_iris()
X = iris.data#the floating point values
y = iris.target#unsigned integers specifying group


fig = plt.figure(1, figsize=(4, 3))
plt.clf()
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)

plt.cla()
pca = decomposition.PCA(n_components=3)
pca.fit(X)
X = pca.transform(X)

for name, label in [('Setosa', 0), ('Versicolour', 1), ('Virginica', 2)]:
    ax.text3D(X[y == label, 0].mean(),
              X[y == label, 1].mean() + 1.5,
              X[y == label, 2].mean(), name,
              horizontalalignment='center',
              bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))
# Reorder the labels to have colors matching the cluster results
y = np.choose(y, [1, 2, 0]).astype(np.float)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.spectral,
           edgecolor='k')

ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])

plt.show()

Investigate `plt.legend`. If you hit a roadblock, come back to stackoverflow detailing exactly how reality diverged from your expectations. — jez, Mar 29 '18 at 21:05

score 4 · Accepted Answer · answered Mar 30 '18 at 00:52

There are some issues with the other answer on which neither the OP, nor the answerer seem to be clear about; this is hence not a complete answer, but rather an appendix to the existing answer.

The spectral colormap has been removed from matplotlib in version 2.2, use Spectral or nipy_spectral or any other valid colormap.
Any colormap in matplotlib ranges from 0 to 1. If you call it with any value outside that range, it will just give your the outmost color. To get a color from a colormap you hence need to normalize the values. This is done via a Normalize instance. In this case this is internal to scatter.

Hence use sc = ax.scatter(...) and then sc.cmap(sc.norm(value)) to get a value according to the same mapping that is used within the scatter. Therefore the code should rather use
```
[sc.cmap(sc.norm(i)) for i in [1, 2, 0]] 
```
The legend is outside the figure. The figure is 4 x 3 inches in size (figsize=(4, 3)). The axes takes 95% of that space in width (rect=[0, 0, .95, 1]). The call to legend places the legend's right center point at 1.7 times the axes width = 4*0.95*1.7 = 6.46 inches. (bbox_to_anchor=(1.7,0.5)).

Alternative suggestion from my side: Make the figure larger (figsize=(5.5, 3)), such that the legend will fit in, make the axes take only 70% of the figure width, such that you have 30% left for the legend. Position the legend's left side close to the axes boundary (bbox_to_anchor=(1.0, .5)).

For more on this topic see How to put the legend out of the plot.

The reason you still see the complete figure including the legend in a jupyter notebook is that jupyter will just save everything inside the canvas, even if it overlaps and thereby enlarge the figure.

In total the code may then look like

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np; np.random.seed(5)
from sklearn import decomposition, datasets 

centers = [[1, 1], [-1, -1], [1, -1]]
iris = datasets.load_iris()
X = iris.data #the floating point values
y = iris.target #unsigned integers specifying group

fig = plt.figure(figsize=(5.5, 3))
ax = Axes3D(fig, rect=[0, 0, .7, 1], elev=48, azim=134)

pca = decomposition.PCA(n_components=3)
pca.fit(X)
X = pca.transform(X)

labelTups = [('Setosa', 0), ('Versicolour', 1), ('Virginica', 2)]
for name, label in labelTups:
    ax.text3D(X[y == label, 0].mean(),
              X[y == label, 1].mean() + 1.5,
              X[y == label, 2].mean(), name,
              horizontalalignment='center',
              bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))
# Reorder the labels to have colors matching the cluster results
y = np.choose(y, [1, 2, 0]).astype(np.float)
sc = ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap="Spectral", edgecolor='k')

ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])

colors = [sc.cmap(sc.norm(i)) for i in [1, 2, 0]]
custom_lines = [plt.Line2D([],[], ls="", marker='.', 
                mec='k', mfc=c, mew=.1, ms=20) for c in colors]
ax.legend(custom_lines, [lt[0] for lt in labelTups], 
          loc='center left', bbox_to_anchor=(1.0, .5))

plt.show()

and produce

Those are some lovely details. Thanks in particular for clarifying the normalization behavior of `ax.scatter`. That was a bit mysterious — tel, Mar 30 '18 at 01:05
I might remove the 'centers' list, it's not used, and clutters the code — con, Mar 30 '18 at 12:38
@ImportanceOfBeingErnest May I ask you kindly to check related [question](https://stackoverflow.com/questions/68895380/automated-legend-creation-for-3d-plot). Thanks in advance — Mario, Sep 01 '21 at 18:47

tel · Answer 2 · 2018-03-29T22:25:50.323

Needed a few tweaks (plt.cm.spectral is the danged weirdest colormap I've ever dealt with), but it seems to be good now:

from matplotlib.lines import Line2D
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from sklearn import decomposition
from sklearn import datasets

np.random.seed(5)

centers = [[1, 1], [-1, -1], [1, -1]]
iris = datasets.load_iris()
X = iris.data#the floating point values
y = iris.target#unsigned integers specifying group


fig = plt.figure(1, figsize=(4, 3))
plt.clf()
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)

plt.cla()
pca = decomposition.PCA(n_components=3)
pca.fit(X)
X = pca.transform(X)

labelTups = [('Setosa', 0), ('Versicolour', 1), ('Virginica', 2)]
for name, label in labelTups:
    ax.text3D(X[y == label, 0].mean(),
              X[y == label, 1].mean() + 1.5,
              X[y == label, 2].mean(), name,
              horizontalalignment='center',
              bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))
# Reorder the labels to have colors matching the cluster results
y = np.choose(y, [1, 2, 0]).astype(np.float)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.spectral, edgecolor='k')

ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])

colors = [plt.cm.spectral(np.float(i/2)) for i in [1, 2, 0]]
custom_lines = [Line2D([0], [0], linestyle="none", marker='.', markeredgecolor='k', markerfacecolor=c, markeredgewidth=.1, markersize=20) for c in colors]
ax.legend(custom_lines, [lt[0] for lt in labelTups], loc='right', bbox_to_anchor=(1.7, .5))

plt.show()

Here's a link to an online Jupyter notebook with a live version of the script (requires an account for rerunning, though).

Short explanation

You're trying to add three legend markers for a single plot, which is nonstandard behavior. Thus, you need to manually create the shapes that your legend will display.

Longer explanation

This line of code recreates the colors you used in your plot:

colors = [plt.cm.spectral(np.float(i/2)) for i in [1, 2, 0]]

and then this line of code draws some appropriate-looking dots that we'll eventually display on your legend:

custom_lines = [Line2D([0], [0], linestyle="none", marker='.', markeredgecolor='k', markerfacecolor=c, markeredgewidth=.1, markersize=20) for c in colors]

The first two args are just the (internal) x and y coords of the single dot that will be drawn, linestyle="none" suppresses the line that Line2D would normally draw by default, and the rest of the args create and style the dot itself (referred to as a marker in the terminology of the matplotlib api).

Finally, this statement actually creates the legend:

ax.legend(custom_lines, [lt[0] for lt in labelTups], loc='right', bbox_to_anchor=(1.7, .5))

The first arg is of course a list of the dots we just drew, and the second arg is a list of the labels (one per dot). The remaining two args tell matplotlib where to draw the actual box containing the legend. The last arg, bbox_to_anchor, is basically a way to manually fiddle with the positioning of the legend, which I had to do since matplotlib support for 3D anything is still a little behind the curve. On 2D plots you typically don't need it, and, since matplotlib usually does a decent job of automatically positioning the legend on 2D plots in the first place, you often don't even need the loc arg either.

Some colormap weirdness

Don't quite know what was going on with plt.cm.spectral, but in order to get it to behave, for every value I fed it I had to:

a) first cast the value to float

b) then divide the value by 2

a) does occur explicitly in the OP's original code, right before they plot. The divide by 2 thing, I don't know where that comes from. Somehow the call to ax.scatter is implicitly normalizing all of the y values so that the maximum is 1? I guess?

hi @tel is that the same script you used to generate the image? the image is indeed what I want, but the legend box at right isn't showing with the script posted here — con, Mar 29 '18 at 21:54
@con That's... weird. First thing to check is your `matplotlib` version. Open up a Python interpreter and run `import matplotlib; matplotlib.__version__` and tell me what you get. In the meantime I'll set up and post a live version of the script — tel, Mar 29 '18 at 21:59
@con You need to adjust this part of the code `bbox_to_anchor=(1.7, .5)` — DavidG, Mar 29 '18 at 22:17
@con sadly, that's not helpful. I'm running 2.1.0 on my local, so unless they added a related bug with the minor version bump, that's likely not the problem. I have two other guesses. The first is that the `bbox_to_anchor` settings that work for me on my system might not be good for you. Try deleting that argument completely and rerunning. My second guess is that the backend (the part of `matplotlib` that actually draws the pictures) you're using has flawed 3D support. There's instructions on how to change your backend [here](https://matplotlib.org/tutorials/introductory/usage.html#backends) — tel, Mar 29 '18 at 22:18
great! bbox_to_anchor is apparently a very variable setting. This will vary from system to system, and even within one image frame size to another. Thank you @tel ! — con, Mar 29 '18 at 22:23
No bbox_to_anchor is completely deterministic. I provided an answer below, which clarifies all the open questions which arouse from this answer. — ImportanceOfBeingErnest, Mar 30 '18 at 00:53

3D PCA in matplotlib: how to add legend?

2 Answers2

Short explanation

Longer explanation

Some colormap weirdness