21

Not sure of the terminology here, but this would be difference between eq? and equal? in scheme, or the difference between == and strncmp with C strings; where in each case the first would return false for two different strings that actually have the same content and the second would return true.

I'm looking for the latter operation, for Python's ASTs.

Right now, I'm doing this:

import ast
def AST_eq(a, b):
    return ast.dump(a) == ast.dump(b)

which apparently works but feels like a disaster waiting to happen. Anyone know of a better way?

Edit: unfortunately, when I go to compare the two ASTs' __dict__'s, that comparison defaults to using the individual elements' __eq__ methods. ASTs are implemented as trees of other ASTs, and their __eq__ apparently checks for reference identity. So neither straight == nor the solution in Thomas's link work. (Besides which, I also don't want to subclass every AST node type to insert this custom __eq__.)

Wang
  • 3,247
  • 1
  • 21
  • 33
  • 1
    The term you're looking for is "value equality" (indeed as opposed to "reference equality"). – Thomas Jul 22 '10 at 20:13
  • This may be helpful: http://stackoverflow.com/questions/390250/elegant-ways-to-support-equivalence-equality-in-python-classes – Thomas Jul 22 '10 at 20:14

4 Answers4

7

I ran into the same problem. I tried to go this way: first dumb down AST to some easier representation (a tree of dicts):

def simplify(node):
    if isinstance(node, ast.AST):
        res = vars(node).copy()
        for k in 'lineno', 'col_offset', 'ctx':
            res.pop(k, None)
        for k, v in res.iteritems():
            res[k] = simplify(v)
        res['__type__'] = type(node).__name__
        return res
    elif isinstance(node, list):
        return map(simplify, node)
    else:
        return node

and then you can just compare these representations:

data = open("/usr/lib/python2.7/ast.py").read()
a1 = ast.parse(data)
a2 = ast.parse(data)
print simplify(a1) == simplify(a2)

will give you True

EDIT

Just understood that there's no need to create a dict, so you can do just:

def compare_ast(node1, node2):
    if type(node1) is not type(node2):
        return False
    if isinstance(node1, ast.AST):
        for k, v in vars(node1).iteritems():
            if k in ('lineno', 'col_offset', 'ctx'):
                continue
            if not compare_ast(v, getattr(node2, k)):
                return False
        return True
    elif isinstance(node1, list):
        return all(itertools.starmap(compare_ast, itertools.izip(node1, node2)))
    else:
        return node1 == node2
Yorik.sar
  • 5,269
  • 2
  • 20
  • 20
  • 2
    I'd probably drop itertools and starmap and just check `all(compare_ast(n1, n2) for n1, n2 in zip(node1, node2))`. Also, the lengths need to be checked, as zip and izip just happily finish operation without further notice, when one of the iterators is shorter than the other. – Michael Mar 21 '18 at 17:51
  • 2
    with python3.8, you should include 'end_lineno', 'end_col_offset' to ignored attributes – ikamen Feb 11 '20 at 18:25
4

I modified @Yorik.sar's answer for Python 3.9+:

from itertools import zip_longest
from typing import Union


def compare_ast(node1: Union[ast.expr, list[ast.expr]], node2: Union[ast.expr, list[ast.expr]]) -> bool:
    if type(node1) is not type(node2):
        return False

    if isinstance(node1, ast.AST):
        for k, v in vars(node1).items():
            if k in {"lineno", "end_lineno", "col_offset", "end_col_offset", "ctx"}:
                continue
            if not compare_ast(v, getattr(node2, k)):
                return False
        return True

    elif isinstance(node1, list) and isinstance(node2, list):
        return all(compare_ast(n1, n2) for n1, n2 in zip_longest(node1, node2))
    else:
        return node1 == node2
Seanny123
  • 8,776
  • 13
  • 68
  • 124
  • This seems to be missing a base condition which is causing a RecursionError – sbdchd May 13 '21 at 13:20
  • mypy complains when I try to compare `ast.Module`s, but the comparison works fine. I think, the type of parameters should be `Union[ast.AST, List[ast.AST]]` – Expurple Aug 15 '22 at 08:46
  • @Expurple removed the list comprehension. Not sure about the relation between `ast.expr` and `ast.AST` but if that makes MyPy complain less then I'm happy to change it – Seanny123 Aug 15 '22 at 13:35
  • @Seanny123 `expr` is a subclass of `AST` – Expurple Aug 15 '22 at 14:06
3

The following works with Python 2 or 3 and is faster than using itertools:

EDIT: WARNING:

Apparently this code can hang in some (weird) situations. As a result, I can not recommend it.

def compare_ast(node1, node2):

    if type(node1) != type(node2):
        return False
    elif isinstance(node1, ast.AST):
        for kind, var in vars(node1).items():
            if kind not in ('lineno', 'col_offset', 'ctx'):
                var2 = vars(node2).get(kind)
                if not compare_ast(var, var2):
                    return False
        return True
    elif isinstance(node1, list):
        if len(node1) != len(node2):
            return False
        for i in range(len(node1)):
            if not compare_ast(node1[i], node2[i]):
                return False
        return True
    else:
        return node1 == node2
Edward K. Ream
  • 179
  • 1
  • 8
  • 6
    "Apparently this code can hang in some (weird) situations." - It would be better to give more information. – pfalcon Feb 02 '19 at 19:39
  • As mentioned above since Python 3.8 you need to include a few other attributes https://stackoverflow.com/questions/3312989/elegant-way-to-test-python-asts-for-equality-not-reference-or-object-identity#comment106433598_19598419 – user202729 Dec 22 '22 at 16:47
-1

In Python, object identitiy is compared using the is operator (which, unlike ==, cannot be overloaded). Unless implemented by a moron, == will not compare identity, but rather equality (if possible and implemented, of course). And in case of the built-in string class, this is certainly not the case.

There may be another problem with your implementation, though - as dump produces very precise information (suitable for debugging), two asts with e.g. a variable named differently may be considered !=. This may or may not be what you want.

  • 2
    That sort of precision is actually what I want, since I am working on a domain-specific language whose interpreter rewrites it to standard python. – Wang Jul 22 '10 at 20:27