1

I'm trying to recreate seaborn's fill-only confidence interval plotting in raw matplotlib. In doing so, I'm running into strange behavior where the fill_between function leaves gaps between the stuff it's supposed to be filling.

I'm using real-world data on this, but it's well-behaved data: the x values are on the range of about 0-15, and the y values on a range of about 25-85. I'm using statsmodels to fit the line and generate the confidence intervals with essentially the code from this prior SO, and the fitted values as well as the upper and lower bounds of the confidence intervals are as they should be (the ranges are appropriate, etc.). So there's nothing wrong with the data.

Here's the relevant part of the code:

def make_plot(x, y):
    fig = plt.figure(figsize=(12, 9))
    ax = fig.add_subplot(1, 1, 1)
    ax.plot(x, y, 'k.', ms=5)
    ax.locator_params(nbins=3)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    regline =  sm.OLS(y,sm.add_constant(x)).fit()
    fitted = regline.fittedvalues
    ax.plot(x, fitted, color=(0.2, 0.2, 0.2, 0.2), linewidth=2)
    ci_low, ci_high = get_ci_values(regline)
    ax.fill_between(x, ci_low, fitted, facecolor=(0.4, 0.4, 0.9, 0.2))
    ax.fill_between(x, ci_high, fitted, facecolor=(0.9, 0.4, 0.4, 0.2))
    return fig

The line fill works fine until it hits around x=10, y=50, and then it starts to leave bizarre gaps where it doesn't come all the way to the regression line. Here's an example:

image with horrible gap

What have I done wrong here? I've tried a bunch of stuff, including:

  • adding lines for the low and high confidence intervals

  • adding interpolate=True to the fill_between calls

  • adding where=x>0 to the fill_between calls

but none of that makes any difference.

I also note that seaborn manages to make its beautiful fills using fill_between, using exactly the same strategy, and seaborn's plotting works correctly on the data I'm using...

Paul Gowder
  • 2,409
  • 1
  • 21
  • 36

1 Answers1

4

One cannot know for sure because the question is missing the essential part, namely the data itself (see Minimal, Complete, and Verifiable example).

The strong suspicion here would however be that the data is not sorted.

The (untested) solution would be to sort the data,

ax.plot(np.sort(x), fitted[np.argsort(x)])
ax.fill_between(np.sort(x), ci_low[np.argsort(x)], fitted[np.argsort(x)])

To understand why values need to be sorted, maybe a picture can tell more than a thousands words.

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Well, that was the solution, and thank you (sorry for not posting the whole data, I couldn't reproduce with simulated data, and the actual data was a lot.) Out of curiosity, why does it have to be sorted for this to work? – Paul Gowder Jan 10 '18 at 23:56
  • Try to shade some area between unsorted points with a pencil on paper. It will be chaotic. If you think about it, when shading something on paper your brain sorts the points somehow, such that it will give a continuous area. Same with the code: Trying to fill a polygon with x coordinates starting at 1, then 50, then 2, then 32 does make little sense. – ImportanceOfBeingErnest Jan 11 '18 at 00:03