Datashader canvas.line() aliasing

Question

I use bokeh to plot temperature curves, but in some cases the dataset is quite big (> 500k measurements) and I'm have a laggy user experience with bokeh (event with output_backend="webgl"). So I'm experimenting datashader to get a faster rendering and a smoother user experience.

But the visual result given by datashader is not as beautiful as bokeh's result, datashader result has aliasing :

I obtain this side-by-side comparison with the following code :

import pandas as pd
import datashader as ds
import datashader.transfer_functions as tf
from bokeh.plotting import figure
from bokeh.io import output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.layouts import row
import numpy as np

output_notebook()

# generate signal
n = 2000
start = 0
end = 70
signal = [np.sin(x) for x in np.arange(start, end, step=(end-start)/n)]
signal = pd.DataFrame(signal, columns=["signal"])
signal = signal.reset_index()

# create a bokeh plot
source = ColumnDataSource(signal)
p = figure(plot_height=300, plot_width=400, title="bokeh plot")
p.line(source=source, x="index", y="signal")

# create a datashader image and put it in a bokeh plot
x_range = (signal["index"].min(), signal["index"].max())
y_range = (signal["signal"].min(), signal["signal"].max())
cvs = ds.Canvas(x_range=x_range, y_range=y_range, plot_height=300, plot_width=400)
agg = cvs.line(signal, 'index', 'signal')
img = tf.shade(agg)
image_source = ColumnDataSource(data=dict(image = [img.data]))
q = figure(x_range=x_range, y_range=y_range, plot_height=300, plot_width=400, title="datashader + bokeh")
q.image_rgba(source = image_source,
             image="image",
             dh=(y_range[1] - y_range[0]),
             dw=(x_range[1] - x_range[0]),
             x=x_range[0],
             y=y_range[0],
             dilate=False)

# visualize both plot, bokeh on left
show(row(p, q))

Have you any idea how to fix this aliasing and get a smooth result ? (similar to bokeh's result)

This code is not runnable; it depends on "noise" and "noised_signal" that are not defined. — James A. Bednar, Jan 15 '18 at 17:11

score 5 · Accepted Answer · answered Jan 15 '18 at 19:05

Here's a runnable version of your code, using HoloViews in a Jupyter notebook:

import pandas as pd, numpy as np, holoviews as hv
from holoviews.operation.datashader import datashade, dynspread
hv.extension("bokeh")
%opts Curve RGB [width=400]
n, start, end = 2000, 0, 70
sine = [np.sin(x) for x in np.arange(start, end, step=(end-start)/n)]
signal = pd.DataFrame(sine, columns=["signal"]).reset_index()
curve = hv.Curve(signal)

curve + datashade(curve)

It's true that the datashaded output here doesn't look very nice. Datashader's timeseries support, like the rest of datashader, was designed to allow accurate accumulation and summation of huge numbers of mathematically perfect (i.e., infinitely thin) curves on a raster grid, so that every x location on every curve will fall into one and only one y location in the grid. Here you just seem to want server-side rendering of a large timeseries, which requires partial incrementing of multiple nearby bins in the grid and isn't something that datashader is optimized for yet.

One thing you can do already is to render the curve at a high resolution then "spread" it so that each non-zero pixel will show up in neighboring pixels as well:

curve + dynspread(datashade(curve, height=1200, width=1200, dynamic=False, \
                            cmap=["#30a2da"]), max_px=3, threshold=1)

Here I set the color to match Bokeh's default, then forced HoloView's "dynspread" function to spread by 3 pixels. Using Datashader+Bokeh as in your version you would do ``img = tf.spread(tf.shade(agg), px=3)` and increase the plot size in the Canvas call to get a similar result.

I haven't tried running a simple smoothing filter over the result of tf.shade() or tf.spread(), but those both just return RGB images, so some filter like that would probably give good results.

The real solution would be to implement an optional antialiased line-drawing function for datashader, operating when the lines are drawn first rather than fixing up the pixels later, but that would take some work. Contributions welcome!

I wish to be able to add antialiased line-drawing in datashader source, but I'm afraid to be not enough good developer. Your suggestion are interesting, I will study them, but if possible without holoviews : I want to create a sort of advanced dashboard application, and I'm more comfortable with bokeh for now. Have you any suggestion/link for a "simple smoothing filter" on the RGB images ? — Louc, Jan 15 '18 at 21:05
The suggested "tf.spread" code should achieve the same result using datashader without holoviews. PIL includes various blurring filters, and you can use tf.Image.to_pil() to get a PIL object, but you'd have to then convert back from the PIL object into something Bokeh will accept. Or you could do a https://en.wikipedia.org/wiki/Box_blur directly on the tf.Image object (which is just an xarray). — James A. Bednar, Jan 15 '18 at 21:41

Datashader canvas.line() aliasing

1 Answers1