I have a program in which updating a value needs to trigger the re-computation of other values. Those re-computation can in turn trigger new ones, etc. I'll detail the implementation below, though the crux is really about an algorithm/approach to properly merge those tree branches.
Conceptually, this can be modeled as a tree. In the following graph, let's consider that A, B, C are big data structure (python classes actually) each containing many values to be computed. Changing a value within one of these structure may trigger the need to recompute values in others.The dependency can be captured in this tree:
So, structure A has 3 variables (A1, A2, A3).
- If we change A1, we need to re-compute B1. Similarly, structure B has B1-B2-B3. Changing B1 means we need to recompute B2, B3, and changing B2 or B3 means we need to recompute A3. Likewise for C.
- We can assume there are no circular references (e.g. A3 wouldn't imply the need to compute B1 or something like that).
- The tree is ordered from top to bottom - e.g. computation are ordered and the ones at the top will happen first.
- The (current) algorithm goes depth-first.
As a few example, here are some computation trees. I've highlighted in the same color repeated computations that I want to avoid:
If I change B1, the 1st tree above tells me that B2, B3 will be recomputed. The same tree tells me that B2 will trigger recalculating A3. B3 will trigger the same computation. So we compute A3 twice.
Here we see that we will do C2 --> B1 --> B2 --> A3. Then, we will do B3 --> A3. Then A3. After that, we will do A1 --> B1 --> B2 --> A3. Then, B3 --> A3.
As an example of what should ideally happen, let's say I were to change D1. The depth-first computaiton would look like this:
While the optimized version would be like:
This all takes places within python classes (that contain Panda dataframes, whose columns need to be computed).
Each python class has an (ordered) dictionary that describe which column (A1, A2, etc.) should trigger computation of what other columns. For example:
class B:
dependencies = {"B1":("B2", "B3"), "B2":("A3"), "B3":("A3")}
The current algo is depth-first, because we essentially do:
class B:
....
def compute_dependencies(self, value):
self.compute_values(value) # we update the current value, say B2
for key, deps in self.dependencies.items():
for dep in deps:
obj = get_obj_for_variable(dep) # get obj, in this case of class A
obj.compute_dependencies(dep) # call A to update A3, then its dependencies if any
Which means we're going to solve this depth-first. The above is somewhat pseudo-code.
My question is:
What kind of algorithm could I use in order to:
- fetch all the dependencies from the classes involved (which will be ordered python dictionaries)
- Merge those dictionaries in a way that allows me to remove the duplicated computation (as shown above)
- But still respect the the order of the computation (e.g. both B2, B3 need to be computed before we recompute A3 etc.)
Some python code to merge the dictionaries is fine, or just pseudo code for an algo too.