I've tracked down the bug. The bug is in pandas
technically, not seaborn
as I originally thought, though it involves code from pandas
, seaborn
, and matplotlib
...
In pandas.tools.plotting.ScatterPlot._make_plot
the following code occurs to choose the colours to be used in the scatter plot
if c is None:
c_values = self.plt.rcParams['patch.facecolor']
elif c_is_column:
c_values = self.data[c].values
else:
c_values = c
In your case c
will be equal to None
, which is the default value, and so c_values
will be given by plt.rcParams['patch.facecolor']
.
Now, as part of setting itself up, seaborn modifies plt.rcParams['patch.facecolor']
to (0.5725490196078431, 0.7764705882352941, 1.0)
which is an RGB tuple. If seaborn
is not used then the value is the matplotlib default which is 'b'
(a string indicating the colour "blue").
c_values
is then used later on to actually plot the graph within ax.scatter
scatter = ax.scatter(data[x].values, data[y].values, c=c_values,
label=label, cmap=cmap, **self.kwds)
The issue arises because the keyword argument c
can accept multiple different types of argument, it can accept:-
- a string (such as
'b'
in the original matplotlib case);
- a sequence of color specifications (say a sequence of RGB values);
- a sequence of values to map onto the current colormap.
The matplotlib docs specifically state the following, highlighting mine
c can be a single color format string, or a sequence of color specifications of length N, or a sequence of N numbers to be mapped to colors using the cmap and norm specified via kwargs (see below). Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however.
What basically happens is that matplotlib takes the c_values
value (which is a tuple of three numbers) and then maps those colours onto the current colormap (which is set by pandas to be Greys
by default). As such, you get three scatter points with different "greyishness". When you have more than 3 scatter points, matplotlib assumes that it must be a RGB tuple because the length doesn't match the length of the data arrays (3 != 4) and so uses it as a constant RBG colour.
This has been written up as a bug report on the pandas Github here.