The non-pedantic answer to this question is yes, they do, since the AST is an intermediate representation used by the compiler; once the AST is generated, that is what's used to generate the bytecode. A simple way to check that the two ASTs are the same is to use the ast.dump
function and then compare the results as strings.
The pedantic answer is that it depends what you mean by "identical" - specifically, what properties of the two ASTs you wish to compare to determine if they are identical.
For example, x = 1; raise ValueError()
and x = 1\nraise ValueError()
compile to "identical" ASTs:
>>> import ast
>>> print(ast.dump(ast.parse('x = 1; raise ValueError()')))
Module(body=[
Assign(targets=[Name(id='x', ctx=Store())], value=Num(n=1)),
Raise(exc=Call(func=Name(id='ValueError', ctx=Load()), args=[], keywords=[]), cause=None)
])
>>> print(ast.dump(ast.parse('x = 1\nraise ValueError()')))
Module(body=[
Assign(targets=[Name(id='x', ctx=Store())], value=Num(n=1)),
Raise(exc=Call(func=Name(id='ValueError', ctx=Load()), args=[], keywords=[]), cause=None)
])
However, the AST also includes metadata about line numbers and positions, so these two ASTs are not exactly identical:
>>> ast.parse('x = 1; raise ValueError()').body[1].lineno
1
>>> ast.parse('x = 1\nraise ValueError()').body[1].lineno
2
Furthermore, these line numbers are available at runtime in error messages; the first one says line 1
and the second says line 2
:
>>> exec('x = 1; raise ValueError()')
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
exec('x = 1; raise ValueError()')
File "<string>", line 1, in <module>
ValueError
>>> exec('x = 1\nraise ValueError()')
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
exec('x = 1\nraise ValueError()')
File "<string>", line 2, in <module>
ValueError
It is also technically possible for code to inspect the line number from the error message and then decide its behaviour based on that. Any such code is an abomination and should be taken out back and shot, but as a certified pedant it's my duty to note that such code can exist.
So technically, your code formatter won't result in truly "identical" ASTs because their line/position metadata may be different - your code formatter has to change that metadata in order to do anything useful. But this is a reasonable caveat for an automated code-formatting tool such as yours to have, because people who write code which breaks when you reformat it should know that their code is too fragile to be reformatted by an automatic tool.
For completeness, if you want to ensure that the compiled bytecode is the same, you can use the dis.get_instructions
function: this is a stricter check than ast.dump
because the bytecode includes the line numbers (but not the positions within lines), but if your formatter shouldn't move code between different lines then you might prefer this way.
>>> import dis
>>> instructions1 = list(dis.get_instructions('x = 1; y = 2'))
>>> instructions2 = list(dis.get_instructions('x = 1\ny = 2'))
>>> instructions1 == instructions2
False