2

I'm trying to create a 2D line chart with seaborn, but I get several artefacts as seen here, i.e. lines that suddenly shoot down or up with barely-visible vertical lines: borked lineplot

Excel on the other hand produces a correct visualisation from the same file: correct lineplot

My code follows the seaborn examples (a sample test.csv can be found here):

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('test.csv')
sns.set()
lp = sns.lineplot(x=data['x'], y=data['y'], sort=False, lw=1)
plt.show()

Am I doing something wrong, or is matplotlib unable to handle overlapping values?

SebiH
  • 669
  • 1
  • 8
  • 18
  • 1
    Use `plt.plot(data['x'].values, data['y'].values, lw=1)` instead. Meaning, matplotlib itself is perfectly capable of producing the desired plot. – ImportanceOfBeingErnest Oct 04 '19 at 21:45
  • @ImportanceOfBeingErnest That's true, thanks! Can you post that as an answer? Seems to me like there's a bug in seaborn then. – SebiH Oct 04 '19 at 21:56
  • No there is no bug. But a sns.lineplot by default is not meant to draw lines in 2D space. You can check the documentation and make sure to understand all parameters. – ImportanceOfBeingErnest Oct 04 '19 at 22:05

2 Answers2

4

By default, Seaborn calculates the mean of multiple observations of the y variable at the same x level. This behaviour can be disabled/controlled using the estimator=None parameter.

When adding this to the original code and data, we can observe that the artifacts are no longer present.

data = pd.read_csv('test.csv')
sns.set()
lp = sns.lineplot(x=data['x'], y=data['y'], sort=False, lw=1, estimator=None)
plt.show()

Output

Adam
  • 41
  • 4
1

It seems that in your data some points have the same x values. line_plot will see them as a single point with different samples, so it will compute the mean as the actual point and plot the error bar. The vertical artifacts are such error bars.

A hacky solution is adding a random tiny shift to your x values. In my case, I was trying to plot a PR curve and I encountered the same problem. I simply added an alternating shift to make sure there are no vertical segments:

  precision, recall, unused_thresholds = sklearn.metrics.precision_recall_curve(
      y_true, y_pred)

  shift_recall = np.empty_like(recall)
  shift_recall[::2] = shift
  shift_recall[1::2] = -shift

  line_plot = sns.lineplot(x=recall + shift_recall, y=precision)

Before the fix: PR Curve with vertical artifacts

After the fix: PR Curve without vertical artifacts

Dharman
  • 30,962
  • 25
  • 85
  • 135
Dawei Yang
  • 606
  • 1
  • 9
  • 19