4

I have two sets of x-y data, that I'd like to plot as a scatterplot, using sns.scatterplot. I want to highlight two different things:

  • the difference between different types of data
  • the difference between the first and the second set of x-y data

For the first, I'm using the inbuilt hue and style, for the second, I'd like to have filled vs. unfilled markers, but I'm wondering how to do so, without doing it all by hand with plt.scatter, where I would have to implement all the magic of sns.scatterplot by hand.


long version, with MWE:

I have X and Y data, and also have some type info for each point of data. I.e. I have a sample 1 which is of type A and yields X=11, Y=21 at the first sampling and X=10, Y=21 at the second sampling. And the same deal for sample 2 of type A, sample 3 of type B and so on (see example file at the end).

So i want to visualize the differences between two samplings, like so:

data = pd.read_csv('testdata.csv', sep=';', index_col=0, header=0)
# data for the csv at the end of the question
sns.scatterplot(x=data['x1'], y=data['y1'])
sns.scatterplot(x=data['x2'], y=data['y2'])

plot

Nice, I can easily see that the first sampling seems to show a linear relationship between X and Y, whereas the second one shows some differences. Now what interests me, is which type of data is affected the most by these differences and that's why I'm using seaborn, instead of pure matplotlib: sns.scatterplot has a lot of nice stuff built in, e.g. hue (and style, to get symbols for printing in b&w):

sizes = (200, 200) # to make stuff more visible
sns.scatterplot(x=data['x1'], y=data['y1'], hue=data['type'], style=data['type'],
                size=data['type'], sizes=sizes)
sns.scatterplot(x=data['x2'], y=data['y2'], hue=data['type'], style=data['type'],
                size=data['type'], sizes=sizes)

plot2

OK, so I can easily distinguish my data types, but I lost all information about which sample is what. The obvious solution to me seem to use filled markers for one, and unfilled ones for the other.

However, I can't seem to do that.

I'm aware of this question/answer, using fc='none' which is not documented in the sns.scatterplot documentation but this fails, when also using hue:

sns.scatterplot(x=data['x1'], y=data['y1'], hue=data['type'], style=data['type'],
                size=data['type'], sizes=sizes)
sns.scatterplot(x=data['x2'], y=data['y2'], hue=data['type'], style=data['type'],
                size=data['type'], sizes=sizes, fc='none')

plot3

As you can see, the second set of markers simply vanishes (there's some artifacts in the B data, where hints of a white cross are visible).

I can kinda fix that by setting ec=...:

sns.scatterplot(x=data['x1'], y=data['y1'], hue=data['type'], style=data['type'],
                size=data['type'], sizes=sizes)
sns.scatterplot(x=data['x2'], y=data['y2'], hue=data['type'], style=data['type'],
                size=data['type'], sizes=sizes, fc='none', 
                ec=('b','b','y','y','y', 'y', 'g', 'g', 'g','r'))
# I would have to define the proper colors, but for this example, they're close enough

plot4

but that obviously has a few issues:

  • the markers in the legend aren't fitting anymore, neither color nor fill
  • and I'm already halfway in doing-it-all-by-hand territory anyways, e.g. my ec= would fail when I want to plot a new dataset with sample_no 11.

How can I do that with seaborn? Filled vs. unfilled seems quite an obvious flag for scatterplots, but I can't seem to find it.


data for testdata.csv:

sample_no;type;x1;y1;x2;y2
1;A;11;21;10;21
2;A;12;22;12;21
3;B;13;23;13.2;22.8
4;B;14;24;13.8;24
5;B;15;25;14.8;25.2
6;B;16;26;16.3;25.9
7;C;17;27;18;28
8;C;18;28;20;26
9;C;19;29;20;30
10;D;20;30;19;28
JC_CL
  • 2,346
  • 6
  • 23
  • 36
  • 2
    Short answer: [scatterplot is different](https://stackoverflow.com/q/66404883/8881141). – Mr. T Feb 05 '22 at 17:20

0 Answers0