4

This topic is difficult to Google, because of "node" (not node.js), and "graph" (no, I'm not trying to make charts).

Despite being a pretty well rounded and experienced developer, I can't piece together a mental model of how these sorts of editors get data in a sensible way, in a sensible order, from node to node. Especially in the Alteryx example, because a Sort module, for example, needs its entire upstream dataset before proceeding. And some nodes can send a single output to multiple downstream consumers.

I was able to understand trees and what not in my old data structures course back in the day, and successfully understand and adapt the basic graph concepts from https://www.python.org/doc/essays/graphs/ in a real project. But that was a static structure and data weren't being passed from node to node.

Where should I be starting and/or what concept am I missing that I could use implement something like this? Something to let users chain together some boxes to slice and dice text files or data records with some basic operations like sort and join? I'm using C#, but the answer ought to be language independent.

amonroejj
  • 573
  • 4
  • 16

1 Answers1

3

This paradigm is called Dataflow Programming, it works with stream of data which is passed from instruction to instruction to be processed.

Dataflow programs can be programmed in textual or visual form, and besides the software you have mentioned there are a lot of programs that include some sort of dataflow language.

To create your own dataflow language you have to:

  1. Create program modules or objects that represent your processing nodes realizing different sort of data processing. Processing nodes usually have one or multiple data inputs and one or multiple data output and implement some data processing algorithm inside them. Nodes also may have control inputs that control how given node process data. A typical dataflow algorithm calculates output data sample from one or many input data stream values as for example FIR filters do. However processing algorithm also can have data values feedback (output values in some way are mixed with input values) as in IIR filters, or accumulate values in some way to calculate output value
  2. Create standard API for passing data between processing nodes. It can be different for different kinds of data and controlling signals, but it must be standard because processing nodes should 'understand' each other. Data usually is passed as plain values. Controlling signals can be plain values, events, or more advanced controlling language - depending of your needs.
  3. Create arrangement to link your nodes and to pass data between them. You can create your own program machinery or use some standard things like pipes, message queues, etc. For example this functional can be implemented as a tree-like structure whose nodes are your processing nodes, and have references to next nodes and its appropriate input that process data coming from the output of the current node.
  4. Create some kind of nodes iterator that starts from begin of the dataflow graph and iterates over each processing node where it:
    • provides next data input values
    • invokes node data processing methods
    • updates data output value
    • pass updated data output values to inputs of downstream processing nodes
  5. Create a tool for configuring nodes parameters and links between them. It can be just a simple text file edited with text editor or a sophisticated visual editor with GUI to draw dataflow graph.

Regarding your note about Sort module in Alteryx - perhaps data values are just accumulated inside this module and then sorted.

here you can find even more detailed description of Dataflow programming languages.

SergeyLebedev
  • 3,673
  • 15
  • 29
  • While researching this topic, I did strike upon the Wikipedia article from your first paragraph, as well as things like Microsoft's TPL Dataflow Library. Knowing the right term helps with further research, but I was hoping for a bit more concrete info regarding points 2 and 3. There's always source diving, too, I guess. – amonroejj Apr 02 '18 at 21:30
  • @amonroejj I improved the answer, hope it is more clear now. Regarding your note about "source diving" - as the your question subject is rather abstract I guess it would be better to explain the essence, without being attached to concrete programming languages. – SergeyLebedev Apr 03 '18 at 12:41