8

I want to link a plot containing patches (from a GeoJSONDataSource) with a line chart but i'm having trouble getting the attributes of the selected patch.

Its basically a plot showing polygons, and when a polygon is selected, i want to update the line chart with a timeseries of data for that polygon. The line chart is driven by a normal ColumnDataSource.

I can get the indices of the selected patch by adding a callback combined with geo_source.selected['1d']['indices']. But how do i get the data/attributes which correspond to that index? I need to get a 'key' in the attributes which i can then use to update the line chart.

The GeoJSONDataSource has no data attribute in which i can lookup the data itself. Bokeh can use the attributes for things like coloring/tooltips etc, so i assume there must be a way to get these out of the GeoJSONDataSource, i cant find it unfortunately.

edit:

Here is working toy example showing what i've got so far.

import pandas as pd
import numpy as np

from bokeh import events
from bokeh.models import (Select, Column, Row, ColumnDataSource, HoverTool, 
                          Range1d, LinearAxis, GeoJSONDataSource)
from bokeh.plotting import figure
from bokeh.io import curdoc

import os
import datetime
from collections import OrderedDict

def make_plot(src):
    # function to create the line chart

    p = figure(width=500, height=200, x_axis_type='datetime', title='Some parameter',
               tools=['xwheel_zoom', 'xpan'], logo=None, toolbar_location='below', toolbar_sticky=False)
    
    p.circle('index', 'var1', color='black', fill_alpha=0.2, size=10, source=src)

    return p

def make_geo_plot(src):
    # function to create the spatial plot with polygons
    
    p = figure(width=300, height=300, title="Select area", tools=['tap', 'pan', 'box_zoom', 'wheel_zoom','reset'], logo=None)

    p.patches('xs', 'ys', fill_alpha=0.2, fill_color='black',
              line_color='black', line_width=0.5, source=src)
              
    p.on_event(events.SelectionGeometry, update_plot_from_geo)

    return p

def update_plot_from_geo(event):
    # update the line chart based on the selected polygon

    selected = geo_source.selected['1d']['indices']
    
    if (len(selected) > 0):
        first = selected[0]
        print(geo_source.selected['1d']['indices'])


def update_plot(attrname, old, new):
    # Callback for the dropdown menu which updates the line chart
    new_src = get_source(df, area_select.value)    
    src.data.update(new_src.data)
  
def get_source(df, fieldid):
    # function to get a subset of the multi-hierarchical DataFrame
    
    # slice 'out' the selected area
    dfsub = df.xs(fieldid, axis=1, level=0)
    src = ColumnDataSource(dfsub)
    
    return src

# example timeseries
n_points = 100
df = pd.DataFrame({('area_a','var1'): np.sin(np.linspace(0,5,n_points)) + np.random.rand(100)*0.1,
                   ('area_b','var1'): np.sin(np.linspace(0,2,n_points)) + np.random.rand(100)*0.1,
                   ('area_c','var1'): np.sin(np.linspace(0,3,n_points)) + np.random.rand(100)*0.1,
                   ('area_d','var1'): np.sin(np.linspace(0,4,n_points)) + np.random.rand(100)*0.1},
                  index=pd.DatetimeIndex(start='2017-01-01', freq='D', periods=100))

# example polygons
geojson = """{
"type":"FeatureCollection",
"crs":{"type":"name","properties":{"name":"urn:ogc:def:crs:OGC:1.3:CRS84"}},
"features":[
{"type":"Feature","properties":{"key":"area_a"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-108.8,42.7],[-104.5,42.0],[-108.3,39.3],[-108.8,42.7]]]]}},
{"type":"Feature","properties":{"key":"area_b"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-106.3,44.0],[-106.2,42.6],[-103.3,42.6],[-103.4,44.0],[-106.3,44.0]]]]}},
{"type":"Feature","properties":{"key":"area_d"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-104.3,41.0],[-101.5,41.0],[-102.9,37.8],[-104.3,41.0]]]]}},
{"type":"Feature","properties":{"key":"area_c"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-105.8,40.3],[-108.3,37.7],[-104.0,37.4],[-105.8,40.3]]]]}}
]
}"""

geo_source = GeoJSONDataSource(geojson=geojson)

# populate a drop down menu with the area's 
area_ids = sorted(df.columns.get_level_values(0).unique().values.tolist())
area_ids = [str(x) for x in area_ids]
area_select = Select(value=area_ids[0], title='Select area', options=area_ids)
area_select.on_change('value', update_plot)

src = get_source(df, area_select.value)

p = make_plot(src)
pgeo = make_geo_plot(geo_source)

# add to document
curdoc().add_root(Row(Column(area_select, p), pgeo))

Save the code in a .py file and load with bokeh serve example.py --show

enter image description here

Community
  • 1
  • 1
Rutger Kassies
  • 61,630
  • 17
  • 112
  • 97

2 Answers2

3

The geojson data that you pass to GeoJSONDataSource is stored in its geojson property -- as a string. My suggestion isn't particularly elegant: you can just parse the json string using the built-in json module. Here's a working version of update_plot_from_geo that updates the line plot based on the selected polygon:

def update_plot_from_geo(event):
    # update the line chart based on the selected polygon

    indices = geo_source.selected['1d']['indices']

    if indices:
        parsed_geojson = json.loads(geo_source.geojson)
        features = parsed_geojson['features']
        series_key = features[indices[0]]['properties']['key']
        new_source = get_source(df, series_key)
        src.data.update(new_source.data)

You'll also need to import json at the top.

I'm a little surprised there's not an obvious way to get the parsed json data. The GeoJSONDataSource documentation indicates the existence of the geojson attribute, but says it's a JSON object. The JSON documentation seems to hint that you should be able to do something like src.geojson.parse. But the type of geojson is just str. Upon closer inspection, it appears that the docs are using "JSON" ambiguously, referring to the Bokeh JSON class in some cases, and to the built-in JavaScript JSON object in others.

So at the moment, I don't believe there's a better way to get at this data.

senderle
  • 145,869
  • 36
  • 209
  • 233
  • 1
    Thanks, it works fine, if this will be the best answer i'll accept it. I am however a bit weary about separate parsing, since the lookup with `features[indices[0]]` is positional. Any (accidental) shuffling by Bokeh or the JSON module and you misalign the data. Since my spatial data is static, i can do the parsing outside the callback, in case someone else wonders about performance. – Rutger Kassies Nov 17 '17 at 16:05
  • 1
    @RutgerKassies, I see your concern about accidental shuffling. But the documentation indicates that the data is being passed around as a string and re-parsed at multiple points. Since the ordering of arrays in json data is guaranteed to be [stable](https://stackoverflow.com/questions/7214293/is-the-order-of-elements-in-a-json-list-preserved), there's a good chance that reordering arrays would break other parts of Bokeh. It certainly would be a significant contract violation. – senderle Nov 17 '17 at 16:52
3

You should write a custom extension for the GeoJSONDataSource

Here is the coffeescript for GeoJSONDataSource https://github.com/bokeh/bokeh/blob/master/bokehjs/src/coffee/models/sources/geojson_data_source.coffee

I am not very good with custom extension. So I just completely copied GeoJSONDataSource and called it CustomGeo instead. And I just moved the 'data' from @internal to @define. Then bingo, you got yourself a GeoJSONDataSource with a 'data' attribute.

In the example below I did the callback using the 'key' list, but since you now have the data like this, you could write something to doublecheck that it corresponds to the appropriate polygon if you are worried about shuffling

import pandas as pd
import numpy as np

from bokeh.core.properties import Instance, Dict, JSON, Any

from bokeh import events
from bokeh.models import (Select, Column, Row, ColumnDataSource, HoverTool, 
                          Range1d, LinearAxis, GeoJSONDataSource, ColumnarDataSource)
from bokeh.plotting import figure
from bokeh.io import curdoc

import os
import datetime
from collections import OrderedDict

def make_plot(src):
    # function to create the line chart

    p = figure(width=500, height=200, x_axis_type='datetime', title='Some parameter',
               tools=['xwheel_zoom', 'xpan'], logo=None, toolbar_location='below', toolbar_sticky=False)

    p.circle('index', 'var1', color='black', fill_alpha=0.2, size=10, source=src)

    return p

def make_geo_plot(src):
    # function to create the spatial plot with polygons

    p = figure(width=300, height=300, title="Select area", tools=['tap', 'pan', 'box_zoom', 'wheel_zoom','reset'], logo=None)

    a=p.patches('xs', 'ys', fill_alpha=0.2, fill_color='black',
              line_color='black', line_width=0.5, source=src,name='poly')

    p.on_event(events.SelectionGeometry, update_plot_from_geo)

    return p

def update_plot_from_geo(event):
    # update the line chart based on the selected polygon

    try:
      selected = geo_source.selected['1d']['indices'][0]
    except IndexError:
      return

    print geo_source.data
    print geo_source.data['key'][selected]

    new_src = get_source(df,geo_source.data['key'][selected])
    src.data.update(new_src.data)

def update_plot(attrname, old, new):
    # Callback for the dropdown menu which updates the line chart
    print area_select.value
    new_src = get_source(df, area_select.value)    
    src.data.update(new_src.data)

def get_source(df, fieldid):
    # function to get a subset of the multi-hierarchical DataFrame

    # slice 'out' the selected area
    dfsub = df.xs(fieldid, axis=1, level=0)
    src = ColumnDataSource(dfsub)

    return src

# example timeseries
n_points = 100
df = pd.DataFrame({('area_a','var1'): np.sin(np.linspace(0,5,n_points)) + np.random.rand(100)*0.1,
                   ('area_b','var1'): np.sin(np.linspace(0,2,n_points)) + np.random.rand(100)*0.1,
                   ('area_c','var1'): np.sin(np.linspace(0,3,n_points)) + np.random.rand(100)*0.1,
                   ('area_d','var1'): np.sin(np.linspace(0,4,n_points)) + np.random.rand(100)*0.1},
                  index=pd.DatetimeIndex(start='2017-01-01', freq='D', periods=100))

# example polygons
geojson = """{
"type":"FeatureCollection",
"crs":{"type":"name","properties":{"name":"urn:ogc:def:crs:OGC:1.3:CRS84"}},
"features":[
{"type":"Feature","properties":{"key":"area_a"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-108.8,42.7],[-104.5,42.0],[-108.3,39.3],[-108.8,42.7]]]]}},
{"type":"Feature","properties":{"key":"area_b"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-106.3,44.0],[-106.2,42.6],[-103.3,42.6],[-103.4,44.0],[-106.3,44.0]]]]}},
{"type":"Feature","properties":{"key":"area_d"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-104.3,41.0],[-101.5,41.0],[-102.9,37.8],[-104.3,41.0]]]]}},
{"type":"Feature","properties":{"key":"area_c"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-105.8,40.3],[-108.3,37.7],[-104.0,37.4],[-105.8,40.3]]]]}}
]
}"""

implementation = """
import {ColumnarDataSource} from "models/sources/columnar_data_source"
import {logger} from "core/logging"
import * as p from "core/properties"

export class CustomGeo extends ColumnarDataSource
  type: 'CustomGeo'

  @define {
    geojson: [ p.Any     ] # TODO (bev)
    data:    [ p.Any,   {} ]
  }

  initialize: (options) ->
    super(options)
    @_update_data()
    @connect(@properties.geojson.change, () => @_update_data())

  _update_data: () -> @data = @geojson_to_column_data()

  _get_new_list_array: (length) -> ([] for i in [0...length])

  _get_new_nan_array: (length) -> (NaN for i in [0...length])

  _flatten_function: (accumulator, currentItem) ->
    return accumulator.concat([[NaN, NaN, NaN]]).concat(currentItem)

  _add_properties: (item, data, i, item_count) ->
    for property of item.properties
      if !data.hasOwnProperty(property)
        data[property] = @_get_new_nan_array(item_count)
      data[property][i] = item.properties[property]

  _add_geometry: (geometry, data, i) ->

    switch geometry.type

      when "Point"
        coords = geometry.coordinates
        data.x[i] = coords[0]
        data.y[i] = coords[1]
        data.z[i] = coords[2] ? NaN

      when "LineString"
        coord_list = geometry.coordinates
        for coords, j in coord_list
          data.xs[i][j] = coords[0]
          data.ys[i][j] = coords[1]
          data.zs[i][j] = coords[2] ? NaN

      when "Polygon"
        if geometry.coordinates.length > 1
          logger.warn('Bokeh does not support Polygons with holes in, only exterior ring used.')
        exterior_ring = geometry.coordinates[0]
        for coords, j in exterior_ring
          data.xs[i][j] = coords[0]
          data.ys[i][j] = coords[1]
          data.zs[i][j] = coords[2] ? NaN

      when "MultiPoint"
        logger.warn('MultiPoint not supported in Bokeh')

      when "MultiLineString"
        flattened_coord_list = geometry.coordinates.reduce(@_flatten_function)
        for coords, j in flattened_coord_list
          data.xs[i][j] = coords[0]
          data.ys[i][j] = coords[1]
          data.zs[i][j] = coords[2] ? NaN

      when "MultiPolygon"
        exterior_rings = []
        for polygon in geometry.coordinates
          if polygon.length > 1
            logger.warn('Bokeh does not support Polygons with holes in, only exterior ring used.')
          exterior_rings.push(polygon[0])

        flattened_coord_list = exterior_rings.reduce(@_flatten_function)
        for coords, j in flattened_coord_list
          data.xs[i][j] = coords[0]
          data.ys[i][j] = coords[1]
          data.zs[i][j] = coords[2] ? NaN

      else
        throw new Error('Invalid type ' + geometry.type)

  _get_items_length: (items) ->
    count = 0
    for item, i in items
      geometry = if item.type == 'Feature' then item.geometry else item
      if geometry.type == 'GeometryCollection'
        for g, j in geometry.geometries
          count += 1
      else
        count += 1
    return count

  geojson_to_column_data: () ->
    geojson = JSON.parse(@geojson)

    if geojson.type not in ['GeometryCollection', 'FeatureCollection']
      throw new Error('Bokeh only supports type GeometryCollection and FeatureCollection at top level')

    if geojson.type == 'GeometryCollection'
      if not geojson.geometries?
        throw new Error('No geometries found in GeometryCollection')
      if geojson.geometries.length == 0
        throw new Error('geojson.geometries must have one or more items')
      items = geojson.geometries

    if geojson.type == 'FeatureCollection'
      if not geojson.features?
        throw new Error('No features found in FeaturesCollection')
      if geojson.features.length == 0
        throw new Error('geojson.features must have one or more items')
      items = geojson.features

    item_count = @_get_items_length(items)

    data = {
      'x': @_get_new_nan_array(item_count),
      'y': @_get_new_nan_array(item_count),
      'z': @_get_new_nan_array(item_count),
      'xs': @_get_new_list_array(item_count),
      'ys': @_get_new_list_array(item_count),
      'zs': @_get_new_list_array(item_count)
    }

    arr_index = 0
    for item, i in items
      geometry = if item.type == 'Feature' then item.geometry else item

      if geometry.type == 'GeometryCollection'
        for g, j in geometry.geometries
          @_add_geometry(g, data, arr_index)
          if item.type == 'Feature'
            @_add_properties(item, data, arr_index, item_count)
          arr_index += 1
      else
        # Now populate based on Geometry type
        @_add_geometry(geometry, data, arr_index)
        if item.type == 'Feature'
          @_add_properties(item, data, arr_index, item_count)

        arr_index += 1

    return data

"""

class CustomGeo(ColumnarDataSource):
  __implementation__ = implementation

  geojson = JSON(help="""
  GeoJSON that contains features for plotting. Currently GeoJSONDataSource can
  only process a FeatureCollection or GeometryCollection.
  """)

  data = Dict(Any,Any,default={},help="wooo")

geo_source = CustomGeo(geojson=geojson)

# populate a drop down menu with the area's 
area_ids = sorted(df.columns.get_level_values(0).unique().values.tolist())
area_ids = [str(x) for x in area_ids]
area_select = Select(value=area_ids[0], title='Select area', options=area_ids)
area_select.on_change('value', update_plot)

src = get_source(df, area_select.value)

p = make_plot(src)
pgeo = make_geo_plot(geo_source)

# add to document
curdoc().add_root(Row(Column(area_select, p), pgeo))
Seb
  • 1,765
  • 9
  • 23
  • Thanks, as long as Bokeh does the parsing its fine by me. My concern with senderle's answer was using two separate modules to parse exactly the same thing, and then 'trusting' they behave identical. – Rutger Kassies Nov 20 '17 at 07:54
  • 1
    I like the idea of creating a custom module; I agree this is the better answer! But it seems very clear to me that by using Bokeh, you are already trusting multiple json parsers to behave identically. For validation in Python, for example, Bokeh just delegates to [`json.loads`](https://github.com/bokeh/bokeh/blob/master/bokeh/core/properties.py#L411). This seems reasonable since json was specifically designed to be a [universal data interchange format](https://www.json.org/). The whole point of json is that different parsers will behave identically. – senderle Nov 20 '17 at 18:54
  • Indeed, after just a bit of looking, you can see that even the core interchange class just uses Python's vanilla [`json.dumps`](https://github.com/bokeh/bokeh/blob/b5f868ba82fff2d2e35155419b458f2f91a618b5/bokeh/core/json_encoder.py#L212) function. – senderle Nov 20 '17 at 19:00
  • Thanks for clearing that up, i may have been unnecessary cautious on the parsing part. – Rutger Kassies Nov 21 '17 at 07:32