Our DMS Software Reengineering Toolkit with its Python front end can do this.
DMS provides infrastructure for parsing code, parameterized by a language definition (e.g, a Python grammar, etc.) and automatically building ASTs, as well as the ability to inspect/navigate/change those ASTs, and prettyprint the resulting modified trees.
Its AST parsing machinery can handle a variety of special cases:
- Parsing files or strings ("streams") as a (Python) full program.
Syntax errors in a stream are reported, and if repairable by single token insertion or deletion, so repaired.
- Parsing a stream according to an arbitrary language nonterminal.
- Parsing a pattern, corresponding to a named grammar nonterminal with named placeholders for the missing subtrees. A pattern match result can be used to match against concrete ASTs to decide match or not, and if match, to provide bindings for the pattern variables.
- Parsing a valid arbitrary substring. This returns a tree with possible missing left or right children, which define the left and right ends of the substring.
For instance, OP could write the following pattern to handle his example:
pattern if_x_is_5(s: statement):statement
= " if x==5: \s ";
DMS will read that pattern and build the corresponding pattern tree.
The paper that OP references really wants operators and keywords to remain as explicit artifacts in the AST. One way to interpret that is that they really want a concrete syntax tree. DMS actually produces "AST"s which are concrete syntax trees with the constant terminals removed; this has the effect of being very close to what a perfect AST should be, but one can easily determine for any leaf node where constant terminals should be inserted (or one can configure DMS to simply produce the uncompressed CSTs).
Personally, I don't see how the goal of the paper of OP's interest can really succeed in providing useful psuedo-code (in spite of its claims). Understanding an algorithm requires understanding of the corresponding data structures and the abstract and concrete algorithms being applied to those data structures. The paper focuses only on raw language syntax; there is no hint of understanding the more abstract ideas.