0

I would like to define some scientific calculations in terms of observable inputs (labeled by numbers in the sketch below) and editable pipelines/operators (lower case letters in the sketch below). The calculation may have multiple interim- and end-results (upper case letters in the sketch below).

If part of the input or operators are changed, it should not be required to re-run the full calculation. Instead, only the the required parts should be automatically updated (similar to cell dependency trees in Excel). That would allow fast parameter variations and sensitivity analysis.

The objects that flow through the pipelines might not only be plain numbers but also more complex data structures, like tables or some nested JSON documents. Processing such an object might take a few seconds.

In order to know if a specific calculation result is fresh or stale, I would like to monitor and display the the state as colored nodes and links:

enter image description here

=> Does ReactiveX provide a clean monitoring concept that would help me to implement such a program?

=> Maybe there is an already existing graphical user interface for monitoring reactive streams?

If ReactiveX does not provide the right tools for me, I am also open for other suggestions.

Or would I need to "reinvent the wheel" and implement it on my own?

Some possible strategies

a) Add side effects to the observables to inform the attached monitoring system.

http://reactivex.io/documentation/operators/do.html

http://introtorx.com/Content/v1.0.10621.0/09_SideEffects.html

b) Two-wired pipes, slow wire for data and fast wire for state/heartbeat

c) Handling state messages through the same pipes (could only work well if the calculations are very fast, I guess).

Related:

How to make tables/spreadsheets (e.g. pandas DataFrame) observable, use triggers or change events?

Recommented way to implement observable collections in Python?

Edit

Just found [observablehq][3] that already has some features to visualize cell dependencies and cell state.

a) While cell b is calculated, dependent cells z and a are marked with a grey border line. Independent cell v has no grey border line.

[![enter image description here][4]][4]

b) Dependencies are (partially) visualized with a so called "minimap":

[![enter image description here][5]][5] https://observablehq.com/@observablehq/introducing-visual-dataflow

There are also visualization notebooks that show all dependencies:

https://observablehq.com/@observablehq/notebook-visualizer

https://observablehq.com/@observablehq/notebook-visualizer-elk-edition

[![enter image description here][6]][6]

Stefan
  • 10,010
  • 7
  • 61
  • 117

1 Answers1

0

A. Just found observablehq that already has some features to visualize cell dependencies and cell state.

a) While cell b is calculated, dependent cells z and a are marked with a grey border line. Independent cell v has no grey border line.

enter image description here

b) Dependencies are (partially) visualized with a so called "minimap":

enter image description here https://observablehq.com/@observablehq/introducing-visual-dataflow

There are also visualization notebooks that show all dependencies:

https://observablehq.com/@observablehq/notebook-visualizer

https://observablehq.com/@observablehq/notebook-visualizer-elk-edition

enter image description here

B. There is also some discussion about reactive Jupyter Notebooks. However, the exiting projects are still in a very experimental stage:

https://github.com/jupyter/notebook/issues/1175

C. Apache Airflow seems to provide some monitoring while executing dependent tasks:

https://airflow.apache.org/docs/apache-airflow/stable/index.html

However:

Airflow is not a data streaming solution. Tasks do not move data from one to the other (though tasks can exchange metadata!).

Stefan
  • 10,010
  • 7
  • 61
  • 117