0

I have some data stored as [samples, x_values_toconsider] and then I give it to seaborn. I've tried many combinations for it to plot the error bands but it doesn't seem to do it, it simply plots each of my samples for a specific feature/value of x. So it ends up have samples number of curves, which is NOT what I want. I've also tried to arrange it as a data frame but that didn't help either.

Why is seaborn plotting each individual sample as it's own curve? I want it to aggregate it with the usual confidence intervals.

Self contained Reproducible code:

#%%
"""
https://seaborn.pydata.org/tutorial/relational.html#relational-tutorial

https://seaborn.pydata.org/examples/errorband_lineplots.html
https://www.youtube.com/watch?v=G3F0EZcW9Ew
https://github.com/knathanieltucker/seaborn-weird-parts/commit/3e571fd8e211ea04b6c9577fd548e7e532507acf
https://github.com/knathanieltucker/seaborn-weird-parts/blob/3e571fd8e211ea04b6c9577fd548e7e532507acf/tsplot.ipynb
"""
from collections import OrderedDict

import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from pandas import DataFrame
import pandas as pd

print(sns)

np.random.seed(22)
sns.set(color_codes=True)

# the number of x values to consider in a given range e.g. [0,1] will sample 10 raw features x sampled at in [0,1] interval
num_x: int = 10
# the repetitions for each x feature value e.g. multiple measurements for sample x=0.0 up to x=1.0 at the end
rep_per_x: int = 5
total_size_data_set: int = num_x * rep_per_x
print(f'{total_size_data_set=}')
# - create fake data set
# only consider 10 features from 0 to 1
x = np.linspace(start=0.0, stop=1.0, num=num_x)
# to introduce fake variation add uniform noise to each feature and pretend each one is a new observation for that feature
noise_uniform: np.ndarray = np.random.rand(rep_per_x, num_x)
# same as above but have the noise be the same for each x (thats what the 1 means)
noise_normal: np.ndarray = np.random.randn(rep_per_x, 1)
# signal function
sin_signal: np.ndarray = np.sin(x)
# [rep_per_x, num_x]
data: np.ndarray = sin_signal + noise_uniform + noise_normal

# data_od: OrderedDict = OrderedDict()
# for idx_x in range(num_x):
#     # [rep_per_x, 1]
#     samples_for_x: np.ndarray = data[:, idx_x]
#     data_od[str(x[idx_x])] = samples_for_x
#
# data_df = pd.DataFrame(data_od)
# data = data_df

print(data)
ax = sns.lineplot(data=data)
# ax = sns.lineplot(data=data, err_style='band')
# ax = sns.lineplot(data=data, err_style='bars')
# ax = sns.lineplot(data=data, ci='sd', err_style='band')
# ax = sns.lineplot(data=data, ci='sd', err_style='bars')
 
# ax = sns.relplot(data=data)

plt.show()

#%%
"""
https://seaborn.pydata.org/examples/errorband_lineplots.html
"""

# import numpy as np
# import seaborn as sns
# from matplotlib import pyplot as plt
# from pandas import DataFrame
#
# fmri: DataFrame = sns.load_dataset("fmri")
# print(fmri)
# sns.lineplot(x="timepoint", y="signal",  hue="region", style="event", data=fmri)
# plt.show()

wrong plot produce

enter image description here

but hoped for something like this (and with addition lines with their own error bands would be even better cuz I have many matrices!)

enter image description here

like:

enter image description here


Note that I do not want to pre-calculate the stds and plot the bands so most of those questions/answers don't work for me.

What puzzles me is that when I pass it the fmri data it works but not when I pass it my matrix of observations for each x value...


related posts:

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
  • really nice related answer without seaborn: https://stackoverflow.com/questions/55368485/draw-error-shading-bands-on-line-plot-python?noredirect=1&lq=1 assuming you already have the stds/errors (which unfortunately I already processed the raw data in some cases). – Charlie Parker Nov 08 '21 at 20:07

2 Answers2

2

Seaborn's lineplot with error bands expects a list of x-values with corresponding y-values. To get error bands, the same x-value should appear multiple times with a different y-value.

In your setup, only y-values are provided. Here, Seaborn interprets this as 10 lists of y-values corresponding to default x-values (0,1,2,3,4,5,6,7,8,9) as shown in the tick labels.

To obtain the desired plot, the y-values should be "raveled" to a long 1D array. The corresponding x-values should be repeated (using np.tile):

ax = sns.lineplot(x=np.tile(x, rep_per_x), y=data.ravel())

sns.lineplot with error bands for np arrays

Note that the tutorial examples where no explicit x is given, work with data in the form of a pandas dataframe, or a column of such in dataframe. In that case, the index of the dataframe can be used for the x-values.

For example:

flights = sns.load_dataset("flights")
flights_wide = flights.pivot("year", "month", "passengers")

creates a dataframe with "year" as index, and each "month" as column (with "passengers" as values). Then sns.lineplot(data=flights_wide) creates a lineplot with individual curves per column with the "year" as x-values. Compare that to sns.lineplot(data=flights, x="year", y="passengers") which uses the "long form" of the dataframe and ignores the individual months.

To work with two or more datasets, you could call sns.lineplot twice. Here is an example:

from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

np.random.seed(22)
sns.set(color_codes=True)

num_x: int = 30
rep_per_x1: int = 5
rep_per_x2: int = 20

x = np.linspace(start=0.0, stop=3.0, num=num_x)

noise_uniform1: np.ndarray = np.random.rand(rep_per_x1, num_x) / 10
noise_normal1: np.ndarray = np.random.randn(rep_per_x1, 1) / 10
sin_signal: np.ndarray = np.sin(x)
data1: np.ndarray = sin_signal + noise_uniform1 + noise_normal1

noise_uniform2: np.ndarray = np.random.rand(rep_per_x2, num_x) / 10
noise_normal2: np.ndarray = np.random.randn(rep_per_x2, 1) / 10
cos_signal: np.ndarray = np.cos(x)
data2: np.ndarray = cos_signal + noise_uniform2 + noise_normal2

ax = sns.lineplot(x=np.tile(x, rep_per_x1),
                  y=data1.ravel(),
                  label=1)
sns.lineplot(x=np.tile(x, rep_per_x2),
             y=data2.ravel(),
             label=2, ax=ax)
plt.show()

two times sns.lineplot

Or you could concatenate all the values. Working with a dataframe might make it easier to see what's going on, and would also automatically use the column names to label the axes and the legend.

ax = sns.lineplot(x=np.tile(x, rep_per_x1 + rep_per_x2),
                  y=np.concatenate([data1.ravel(), data2.ravel()]),
                  hue=np.concatenate([np.repeat(1, num_x * rep_per_x1), np.repeat(2, num_x * rep_per_x2)]),
                  palette=['crimson', 'dodgerblue'])
JohanC
  • 71,591
  • 8
  • 33
  • 66
  • what if I had two data sets? How would that be fed to Seaborn? – Charlie Parker Nov 08 '21 at 20:56
  • 1
    You could either call `lineplot` twice, or you could use an additional "column" as a hue value. Organizing it as a dataframe isn't necessary, although such a structure could be easier to understand. – JohanC Nov 08 '21 at 20:59
  • Amazing! You are the man (or women ;) ) Johan! Hopefully my last question...I accidentally already computed the means and stds...can I still use `sns` to plot those with those error bars instead of matplotlib? Or is that not possible with `sns`? – Charlie Parker Nov 08 '21 at 21:33
  • 1
    I don't think seaborn supports such custom error bands. You could use matplotlib, e.g. as in [create custom error band along line](https://stackoverflow.com/questions/63708242/seaborn-matplotlib-create-custom-error-band-along-line). Note that seaborn uses [bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) to calculate the errors. – JohanC Nov 08 '21 at 21:51
  • thanks Johan! Wish I would have my raw data still...I didn't know seaborn needed that when I went down this rabbit hole (evidently) but in the future I will use seaborn instead. One of the reasons I was hoping to use sns was due to exactly that boostrapping reason...alas, for my next experiments or when I repeat these. I learned to save my raw data analysis for the future. Thanks for everything! Really appreciated it. – Charlie Parker Nov 08 '21 at 21:53
  • actually, is it possible to show both bands AND bars like in the matplot lib example you sent? – Charlie Parker Nov 08 '21 at 21:54
  • 1
    To show both bands and bars, I think you need to call `sns.lineplot` twice with the same data (and explicitly setting the color), once with `err_style="band"` and once with `err_style="bars"` – JohanC Nov 08 '21 at 21:59
  • the last question, if my data are integers, is it possible to rename my x values to be something like `Layer1`, `Layer2` instead of 1.0, 2.0 etc.? Or is https://stackoverflow.com/questions/3100985/plot-with-custom-text-for-x-axis-points the only option `plt.xticks(x, my_xticks)`? – Charlie Parker Nov 08 '21 at 22:00
  • 1
    `sns.lineplot(x=np.tile([f'Layer{i}' for i in range(1, num_x+1)], rep_per_x),...)` – JohanC Nov 08 '21 at 22:08
0

For the sake of having a full example with reusable code for future users:

def plot_seaborn_curve_with_x_values_y_values(x: np.ndarray, y: np.ndarray,
                                              xlabel: str, ylabel: str,
                                              title: str,
                                              curve_label: Optional[str] = None,
                                              err_style: str = 'band',
                                              marker: Optional[str] = 'x',
                                              dashes: bool = False,
                                              show: bool = False
                                              ):
    """
    Given a list of x values in a range with num_x_values number of x values in that range and the corresponding samples
    for each specific x value (so [samples_per_x] for each value of x giving in total a matrix of size
    [samples_per_x, num_x_values]), plot aggregates of them with error bands.
    Note that the main assumption is that each x value has a number of y values corresponding to it (likely due to noise
    for example).


    Note:
        - if you want string in the specific x axis point do
        sns.lineplot(x=np.tile([f'Layer{i}' for i in range(1, num_x+1)], rep_per_x),...) assuming the x values are the
        layers. https://stackoverflow.com/questions/69888181/how-to-show-error-bands-for-pure-matrices-samples-x-range-with-seaborn-error/69889619?noredirect=1#comment123544763_69889619
        - note you can all this function multiple times to insert different curves to your plot.
        - note its recommended call show only for if you have one or at the final curve you want to add.
        - if you want bands and bars it might work if you call this function twice but using the bar and band argument
        for each call.

    ref:
        - https://stackoverflow.com/questions/69888181/how-to-show-error-bands-for-pure-matrices-samples-x-range-with-seaborn-error/69889619?noredirect=1#comment123544763_69889619

    :param x: [num_x_values]
    :param y: [samples_per_x, num_x_values]
    :param xlabel:
    :param ylabel:
    :param title:
    :param curve_label:
    :param err_style:
    :param marker:
    :param dashes:
    :param show:
    :return:
    """
    import seaborn as sns
    samples_per_x: int = y.shape[0]
    num_x_values: int = x.shape[0]
    assert(num_x_values == y.shape[1]), f'We are plotting aggreagates for one specific value of x multple values of y,' \
                                        f'thus we need to have the same number of x values match in the x axis.'

    # - since seaborn expects a an x value paired with it's y value, let's flatten the y's and make sure the corresponding
    # x value is aligned with it's y value [num_x_values * samples_per_x]
    x: np.ndarray = np.tile(x, samples_per_x)  # np.tile = Construct an array by repeating A the number of times given by reps.
    assert (x.shape == (num_x_values * samples_per_x,))
    y: np.ndarray = np.ravel(y)  # flatten the y's to match the x values to have the x to it's corresponding y
    assert (y.shape == (num_x_values * samples_per_x,))
    assert (x.shape == y.shape)

    # - plot
    ax = sns.lineplot(x=x, y=y, err_style=err_style, label=curve_label, marker=marker, dashes=dashes)
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    if show:
        plt.show()

e.g.

def plot_seaborn_curve_with_x_values_y_values_test():
    # the number of x values to consider in a given range e.g. [0,1] will sample 10 raw features x sampled at in [0,1] interval
    num_x: int = 10
    # the repetitions for each x feature value e.g. multiple measurements for sample x=0.0 up to x=1.0 at the end
    rep_per_x: int = 5
    total_size_data_set: int = num_x * rep_per_x
    print(f'{total_size_data_set=}')
    # - create fake data set
    # only consider 10 features from 0 to 1
    x = np.linspace(start=0.0, stop=1.0, num=num_x)

    # to introduce fake variation add uniform noise to each feature and pretend each one is a new observation for that feature
    noise_uniform: np.ndarray = np.random.rand(rep_per_x, num_x)
    # same as above but have the noise be the same for each x (thats what the 1 means)
    noise_normal: np.ndarray = np.random.randn(rep_per_x, 1)
    # signal function
    sin_signal: np.ndarray = np.sin(x)
    cos_signal: np.ndarray = np.cos(x)
    # [rep_per_x, num_x]
    y1: np.ndarray = sin_signal + noise_uniform + noise_normal
    y2: np.ndarray = cos_signal + noise_uniform + noise_normal

    plot_seaborn_curve_with_x_values_y_values(x=x, y=y1, xlabel='x', ylabel='y', title='Sin vs Cos')
    plot_seaborn_curve_with_x_values_y_values(x=x, y=y2, xlabel='x', ylabel='y', title='Sin vs Cos')
    plt.show()

output:

enter image description here

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323