1

I would like to visualize a dataset (with legend and color bar). The dataset is a pandas dataframe as follows: X and Y are the values, the year column defines the year of data collection, and the shape column defines the method that is used in data collection. I am wondering how I can plot Y against X while color-coded based on years and markers (and therefore legend) defined by shape in one command such as plot.

df = pd.DataFrame({
    'X':     [1,    1,    2,    2,    2,    3,    3,    4],
    'Y':     [3,    4,    1,    6,    7,    8,    8,    5],
    'year':  [1998, 1999, 1994, 1991, 1999, 1995, 1994, 1992],
    'shape': ['o',  '^',  'o',  '^',  'o',  '^',  'o',  '^']
})

Of course, I can loop over the shapes and plot per shape (method) and add a color bar separately as well ensuring the min and max of the years in the color bar and legend based on shape. I would like to avoid loop and manual setting of legend or color bar if possible. I am wondering if there is a better way forward. Thank you!

Edit:

I am after sth like this based on the answer provided below by Tranbi, I need both legend and color bar: enter image description here

ShGh
  • 67
  • 1
  • 5

1 Answers1

1

This sounds like a job for seaborn:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import matplotlib.lines as mlines

df = pd.DataFrame({
    'X':     [1,    1,    2,    2,    2,    3,    3,    4],
    'Y':     [3,    4,    1,    6,    7,    8,    8,    5],
    'year':  [1998, 1999, 1994, 1991, 1999, 1995, 1994, 1992],
    'method':['A',  'B',  'A',  'C',  'A',  'B',  'A',  'B'],
    'shape': ['o',  '^',  'o',  '*',  'o',  '^',  'o',  '^']
})

palette = 'viridis'

ax = sns.scatterplot(data=df, x='X', y='Y',
                hue='year',
                palette=palette,
                style='shape',
                markers=df['shape'].unique().tolist(),
                legend=False
)
# build legend
lines = [mlines.Line2D([], [], marker=shape, label=method, linestyle='None')
        for shape, method in 
        df[['shape', 'method']].drop_duplicates()
        .sort_values('method').itertuples(index=False)]
ax.legend(handles=lines)

# build colorbar
norm = plt.Normalize(df['year'].min(), df['year'].max())
sm = plt.cm.ScalarMappable(cmap=palette, norm=norm)
ax.figure.colorbar(sm, ax=ax)

plt.show()

Output:

enter image description here

Edit: adding legend and colormap

Edit2: adding method column + changing color palette

Tranbi
  • 11,407
  • 6
  • 16
  • 33
  • Yes, thank you, this solution does fix a significant part of what I am after. Is there a way to make the years as color bar? I believe I can navigate this myself but if there is a simple solution I would appreciate to have it. Thank you very much! – ShGh Feb 20 '23 at 18:42
  • you could use `sns.barplot(data=df, x='X', y='Y', hue='year')` but how would you represent the markers? – Tranbi Feb 20 '23 at 18:44
  • I am not sure if the `sns.barplot` gives what I want. My initial question was for years or any other possible continuous values be directly represent in a color bar instead of legend (which is in your solution). perhaps `sns.scatterplot` should be called a few times to create the color bar and the legend for methods (shapes)? – ShGh Feb 20 '23 at 19:01
  • do you mean something like this? https://stackoverflow.com/questions/62884183/trying-to-add-a-colorbar-to-a-seaborn-scatterplot – Tranbi Feb 20 '23 at 19:10
  • My case is very close to the tip example provided by William Miller with the difference that I want the markers to be different. For example and for tip case provided, let's say we have this data from three different providers or restaurants and I would like to have marker defining each restaurant. Hope I am clear. – ShGh Feb 20 '23 at 19:24
  • you can still add the legend yourself. check my edited answer! – Tranbi Feb 20 '23 at 20:15
  • Perfect, this is what I was after. However, the legend and color bar are created outside the `sns.scatterplot`, I was hoping there was a way to have them all at once in `sns.scatterplot`. Having said that your solution with `sns` is much better that I has in mind. voted up! Thank you. – ShGh Feb 20 '23 at 20:27
  • I have changed the edited the code slightly to induce the method name in a sorted fashion in legend. It works but an additional check if always useful. Thank you. – ShGh Feb 20 '23 at 21:26
  • ok I kept the filtering and sorting into the comprehension. But the result should be unchanged – Tranbi Feb 20 '23 at 21:45
  • yes, the result is unchanged but with much nicer implementation! Thank you! – ShGh Feb 20 '23 at 21:49