My team is building an application which has to solve many user defined formulas. It is a replacement for a huge spreadsheet that our customers use. For e.g. Each formula uses simple arithmetic (mostly) and a few math functions. We are using an expression evaluation library called Parsii to do the actual formula evaluation. But among all the formulas we have to evaluate them in the order of their dependent formula. For e.g.
F1 = a + b
F2 = F1 * 10%
F3 = b / 2
F4 = F2 + F3
In the example above a, b are values input by users. The system should compute F1 & F3 initially since they are directly dependent on user input. Then F3 should be computed. And finally F4.
My question is that what data structure is recommended to model these dependencies of formula evaluation?
We have currently modeled it as a DIRECTED GRAPH. In the example above, F1 & F3 being the root node, and F3 being connected to both, and F4 connected to F3, F4 being the leaf node. We've used the Tinkerpop3 graph implementation to model this.
Any data structure used to model this should have following characteristics. - Easy to change some input data of few top level root nodes (based on user input) - Re-calculate only those formulas that are dependent on the root nodes that got changed (since we have 100s of formulas in a specific calculation context and have to respond back to the GUI layer within 1-2 secs) - Minimize the amount of code to create the data structure via some existing libraries. - Be able to query the data structure to query/lookup the root nodes by various keys (name of formula object, id of the object, year etc.) and be able to edit the properties of those keys.