1

For my project, I use sns.FacetGrid() to plot multiple subplots each containing multiple lines. My general goal is to draw a mean line for each line in each subplot. My idea was to extract the x- and y-data for each line. For this I iterate over each subplot and then over each line object in each subplot (as described in this stackoverflow post). The problem: Each of the subjects seems to contain four 'empty' line objects, but my subplots contain only three lines each. So my expected output is a list of six tuples, each containing an array for my x- and y-data. Does anyone know where these four empty line objects come from and how to get only the x- and y-data for each of the existing (aka. visible) lines?

Here's my code:

import numpy as np
import pandas as pd
import seaborn as sns

# simulate data frames #########################################################

n_outer_folds = 10

plot_df_1 = pd.DataFrame({'Outer Fold':np.linspace(start=1,stop=10,num=n_outer_folds),
                          'train_BAC':np.random.uniform(low=0.6,high=1.0,size=n_outer_folds).tolist(),
                          'train_SPEC':np.random.uniform(low=0.6,high=1.0,size=n_outer_folds).tolist(),
                          'test_BAC':np.random.uniform(low=0.1,high=0.8,size=n_outer_folds).tolist(),
                          'test_SPEC':np.random.uniform(low=0.1,high=0.8,size=n_outer_folds).tolist()
                          })

plot_df_2 = pd.DataFrame({'Outer Fold':np.linspace(start=1,stop=10,num=n_outer_folds),
                          'train_BAC':np.random.uniform(low=0.6,high=1.0,size=n_outer_folds).tolist(),
                          'train_SPEC':np.random.uniform(low=0.6,high=1.0,size=n_outer_folds).tolist(),
                          'test_BAC':np.random.uniform(low=0.1,high=0.8,size=n_outer_folds).tolist(),
                          'test_SPEC':np.random.uniform(low=0.1,high=0.8,size=n_outer_folds).tolist()
                          })

plot_df_list = [plot_df_1,plot_df_2]

# append 'Model' column to make each plot df identifiable
for idx,plot_df in enumerate(plot_df_list):
    plot_df['Model'] = idx

# concatenate all plot dfs
plot_df = pd.concat(plot_df_list)

# create a plotable Dataframe
plot_df_melt = pd.melt(plot_df,
                       id_vars=['Outer Fold','Model'],
                       value_vars=['train_BAC','test_BAC','train_SPEC'],
                       var_name ='Scores',
                       value_name='Score'
                       )

# plot data
g = sns.FacetGrid(plot_df_melt,col="Model",height=4,aspect=2,col_wrap=1)
g.map(sns.lineplot,'Outer Fold','Score','Scores')

# get line data
axes_data = []
ax_lines_data = []

for ax in g.axes.flat:
    axes_data.append(ax)
    for line in ax.lines:
        ax_lines_data.append((line.get_xdata(),line.get_ydata()))

Johannes Wiesner
  • 1,006
  • 12
  • 33

1 Answers1

1

Plotting n different hue categories in a seaborn lineplot gives you 2*n+1 lines in the axes. A minimal example:

df = pd.DataFrame({"x" : [1,2,2,4], "y" : [1,2,3,4], "hue" : list("ABAB")})
ax = sns.lineplot("x", "y", "hue", data=df)
print([line.get_label() for line in ax.lines])

prints

['_line0', '_line1', 'hue', 'A', 'B']

enter image description here

Here, '_line0', '_line1' are the lines shown in the image. They contain the data. Their label starts with an underscore, such that they would not appear in a legend.
The remaining 'hue', 'A', 'B' do not contain any data. Their sole purpose is to make up the legend. 'hue' is the "legend title", which is a normal legend entry as well; 'A', 'B' are the legend entries.

This is a consequence of how seaborn creates legends. Possible options are to filter the lines. E.g. one could take only the ones which have an underscore as first character in their label,

 [line for line in ax.lines if line.get_label()[0] == "_"] 
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712