5

I'm creating a scatter plot from an xarray Dataset using

scat = ds.hvplot.scatter(x='a', y='b', groupby='c', height=900, width=900)

How can I add a regression line to this plot?

I'm also using this to set some of the properties in the plot and I could add the Slope within the hook function but I can't figure out how to access x and y from the plot.state. This also might be completely the wrong way of doing it.

scat = scat.opts(hooks=[hook])

def hook(plot, element):
    print('plot.state:   ', plot.state)
    print('plot.handles: ', sorted(plot.handles.keys()))

    par = np.polyfit(x, y, 1, full=True)
    gradient=par[0][0]
    y_intercept=par[0][1]

    slope = Slope(gradient=gradient, y_intercept=y_intercept,
          line_color='orange', line_dash='dashed', line_width=3.5)

    plot.state.add_layout(slope)

scat = scat.opts(hooks=[hook])
Rob
  • 1,336
  • 1
  • 15
  • 24

2 Answers2

6

HoloViews >= 1.13 now has support for adding a regression line to your plot, so you don't need hooks anymore.

1) You can either add the regression line yourself by specifying keywords slope and y_intercept:

gradient = 2
y_intercept = 15

# create random data
xpts = np.arange(0, 20)
ypts = gradient * xpts + y_intercept + np.random.normal(0, 4, 20)

scatter = hv.Scatter((xpts, ypts))

# create slope with hv.Slope()
slope = hv.Slope(gradient, y_intercept)

scatter.opts(size=10) * slope.opts(color='red', line_width=6)



2) Or you can have HoloViews calculate it for you with hv.Slope.from_scatter():

normal = hv.Scatter(np.random.randn(20, 2))

normal.opts(size=10) * hv.Slope.from_scatter(normal)



Resulting plot:

scatter plot with regression line holoviews 1.13

Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
  • 1
    I didn't see in the documentation, but is this a root mean square (linear) regression? Are there other options here too? This is a great add to hv. – Tyler Russell Mar 30 '20 at 21:03
3

The plot hooks is given two arguments, the second of which is the element being displayed. Since the element contains the data being displayed we can write a callback to compute the slope using the dimension_values method to get the values of the 'a' and 'b' dimensions in your data. Additionally, in order to avoid the Slope glyph being added multiple times, we can cache it on the plot and update its attributes:

def hook(plot, element):
    x, y = element.dimension_values('a'), element.dimension_values('b')
    par = np.polyfit(x, y, 1, full=True)
    gradient=par[0][0]
    y_intercept=par[0][1]

    if 'slope' in plot.handles:
        slope = plot.handles['slope']
        slope.gradient = gradient
        slope.y_intercept = y_intercept
    else:

        slope = Slope(gradient=gradient, y_intercept=y_intercept,
              line_color='orange', line_dash='dashed', line_width=3.5)
        plot.handles['slope'] = slope
        plot.state.add_layout(slope)
philippjfr
  • 3,997
  • 14
  • 15