1

I am looking for some algorithms to find similar formulas in a quantitative way.

For example, given three formulas below:

1. test = a + 4 - b
2. test = a - 16 + 2 * b
3. test = a + 5

I can somehow calculate the similarity between them, say:

Similarity(1,2) = 0.5
Similairty(2,3) = 0.1

Is there any standard way to do so? Basically I guess I need to extract some numeric vectors from each formula, representing their features, but I don't how to do that..

Could anyone can give me some help? Thx.

lllllllllllll
  • 8,519
  • 9
  • 45
  • 80

3 Answers3

2

The approach I would take is to generate a parse tree for the expressions and then apply a tree difference metric. There are a lot to choose from (search the web for "tree distance metric", "parse tree distance", or "parse tree similarity") and even more if you restrict yourself to binary trees (no ternary operator such as ?:). The usual approach is to use the tree edit distance. A couple of issues you need to resolve:

  1. Do variable name changes affect similarity?
  2. Does reordering of operands for commutative operators affect similarity? (E.g., a + b*c vs. b*c + a.)

P.S. A nice survey-type article on measuring similarity between tree structures can be found here.

Community
  • 1
  • 1
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
0

I assume you are looking for a metric to compare/check similarity of these formulae. If each of them is going to contain just the three variables test, a and b, then a very simple metric is to take ratios of coefficients of a, b and the value of the constant.

You can then use a formula like this to judge similarity: similarity = (ratio of constants) * X^2 + 2 * (ratio of coefficients of a) * X + (ratio of coefficients of b). The closer the roots of this quadratic equation in X are to -1, the higher the similarity.

Madhav Datt
  • 1,065
  • 2
  • 10
  • 25
0

We can represent each polynom as a sum of monomials. Then assuming we can compare monomials, we compare each pair of one monomial from the first polynom and one from the second and then we use a matching algorithm to find the matching that gives us the smallest sum of differences in the pairs. If one polynom has more monomials than the other we just add whatever amount is needed of 0 monomials (A+0+0+0+0....). Then all that's left is finding the difference between 2 monomials, which I recommend doing the way Ted Hopp suggested. This way you should get some much more accurate results than just comparing the original polynomials directly.

indjev99
  • 134
  • 1
  • 13