1

My team is building an application which has to solve many user defined formulas. It is a replacement for a huge spreadsheet that our customers use. For e.g. Each formula uses simple arithmetic (mostly) and a few math functions. We are using an expression evaluation library called Parsii to do the actual formula evaluation. But among all the formulas we have to evaluate them in the order of their dependent formula. For e.g.

F1 = a + b
F2 = F1 * 10%
F3 = b / 2
F4 = F2 + F3

In the example above a, b are values input by users. The system should compute F1 & F3 initially since they are directly dependent on user input. Then F3 should be computed. And finally F4.

My question is that what data structure is recommended to model these dependencies of formula evaluation?

We have currently modeled it as a DIRECTED GRAPH. In the example above, F1 & F3 being the root node, and F3 being connected to both, and F4 connected to F3, F4 being the leaf node. We've used the Tinkerpop3 graph implementation to model this.

Any data structure used to model this should have following characteristics. - Easy to change some input data of few top level root nodes (based on user input) - Re-calculate only those formulas that are dependent on the root nodes that got changed (since we have 100s of formulas in a specific calculation context and have to respond back to the GUI layer within 1-2 secs) - Minimize the amount of code to create the data structure via some existing libraries. - Be able to query the data structure to query/lookup the root nodes by various keys (name of formula object, id of the object, year etc.) and be able to edit the properties of those keys.

nitkart
  • 137
  • 3
  • 14
  • 1
    What is wrong with using a directed graph? That is what I would recommend – Makazau Jun 27 '19 at 18:35
  • To me, this sounds like: "please recommend a library that contains a *better* data structure". Which renders your request off topic. – GhostCat Jun 27 '19 at 19:02
  • @MathiasStrohkirch. There is nothing wrong with it. I was just curious to hear of other alternatives. – nitkart Jun 28 '19 at 08:11
  • @GhostCat, I am not asking for a library specifically, but a data structure which can more elegantly model the problem. If there are mature libraries for that data structure, all the more better. I had read the off-topic guidelines and it said that I could ask about "a specific programming problem", or "a software algorithm". – nitkart Jun 28 '19 at 08:16
  • @nitkart this is how I do it: [Sign of a symbolic algebraic expression](https://stackoverflow.com/a/20919547/2521214) however there are faster methods I think based on reverse polish notation and trees ... – Spektre Jun 28 '19 at 09:13
  • I hear you, but then: the java standard library only comes with its collection library. Any advanced data structure (that you don't compose out of standard collection classes) ... does come out of some 3rd party library, doesn't it? – GhostCat Jun 28 '19 at 10:59

1 Answers1

1

Do you store this in a flat file currently?

If you wish to have better queryability, and easier modification, then you could store it as a DAG on database tables.

Maybe something like this (I expect the real solution to be somewhat different):

+-----------------------------------------------------------+
|                         FORMULA                           |
+------------+--------------+----------------+--------------+
|   ID (PK)  | FORMULA_NAME | FORMULA_STRING | FORMULA_YEAR |
+============+==============+================+==============+
|     1      |      F1      |     a + b      |              |
+------------+--------------+----------------+--------------+
|     2      |      F2      |    F1 * 10%    |              |
+------------+--------------+----------------+--------------+
|     3      |      F3      |     b / 2      |              |
+------------+--------------+----------------+--------------+
|     4      |      F4      |    F2 + F3     |              |
+------------+--------------+----------------+--------------+


+--------------------------------------+
|         FORMULA_DEPENDENCIES         |
+-----------------+--------------------+
| FORMULA_ID (FK) | DEPENDS_ON_ID (FK) |
+=================+====================+
|        2        |         1          |
+-----------------+--------------------+
|        4        |         2          |
+-----------------+--------------------+
|        4        |         3          |
+-----------------+--------------------+

With this you can also have the security of easily knowing if a formula depends on a non-existent formula because it would violate the DEPENDS_ON_ID foreign key. Also the database can detect if any of the formulas form a cycle of dependencies. Eg where F1 depends on F2 depends on F3 depends on F1.

Additionally you can easily add whatever metadata you wish to the tables and index on whatever you might query on.

xtratic
  • 4,600
  • 2
  • 14
  • 32
  • Thanks for your suggestions @xtratic. We are indeed storing the formula definitions in the database. And we do have a formula dependency table exactly as you have modeled to preserve formula integrity. My question was related to how to model this in memory at the time of formula evaluation since the evaluation speed has to be fast. Out initial approach was to query the formula table for ones which do not have any entries in formula-dependencies (i.e. the root nodes), start from there, a subsequent query discovers 2nd level formulas, evaluates them, a subsequent query for 3rd level etc. – nitkart Jun 28 '19 at 08:07
  • That approach was soon discovered to be non-performant due to many JDBC lookups. So we switches to an approach of fetching all formulas from the both the tables (via Oracle's CONNECT BY hierarchical queries) and then construct a directed graph in memory to run the actual formula evaluations. So my question was whether are there better data structures than directed graph to model this cached formula definition – nitkart Jun 28 '19 at 08:09