I do not think this is a good idea, but here is an literal implementation of your logic:
awk -F= '
/^[^#]/ && NF > 1 && saw_def { print "not global:", $0; next }
NF > 1 { print "possibly global:", $0 }
{ saw_def = /def/ }
' file.py
This tells awk to use =
as a field separator on four pattern {action}
pairs:
- Line starts with non-
#
, has a 2nd field (thus has a =
), and the saw_def
variable is set:
Print the line as not a global assignment; move on to next line (stop pattern matching)
- Line otherwise has a
=
:
Print the line as possibly a global assignment
- (Any line not matched by #1):
Set saw_def
to a boolean representing whether "def" exists on the line
Here is an example that will fail your test:
def function_name():
int a = 4
int b = 8
The logic you requested will note that a
is not global and that b
is possibly global.
This should be better:
awk '
NF == 0 { next }
{
indent = match ($0, /[^ \t]/) - 1 # count of leading whitespace
has_assign = /^[^#]*=/
}
$1 == "def" && !in_def { in_def = indent; next }
indent > in_def && has_assign { print "not global:", $0; next }
indent <= in_def { in_def = 0 }
has_assign { print "possibly global:", $0 }
' file.py
This attempts to parse the python code, taking advantage of the language's mandatory indentation to keep track of when you're in a function definition. The five pattern {action}
pairs are:
- No fields (blank line):
Skip to next line
- (Match any other line):
Set indent
to the number of leading spaces and tabs before the first
Set has_assign
to whether there is a variable assignment (a =
exists before any #
)
- The first field is "def" and we're not already tracking a function:
Keep track of the current indentation level using ind_def
and stop matching for this line
- We're indented enough to be in a function and we have a assignment:
Report the non-global variable assignment
- The indentation level is not more than the last function:
We're not in a function any more, set the function indentation level back to zero
- We have a variable assignment (and no
next
has been triggered yet):
Report the possibly-global variable assignment
Note, I'm not sure all versions of awk
understand \t
so you might need a literal tab in its place. You can enter that on the command line as Ctrl+v, Tab.
I still do not think a home-made python parser is a good idea. If you really are parsing valid (and known-safe) python code, this similar question has answers that actually load the code, which would of course be definitive. If you're dealing with potentially unsafe code, or python-like code, and/or you lack a python interpreter on the system you're using, my solution might be useful for quick code vetting.