-2

I need to run a series of Python source codes against a bash script that contains logic to check for the presence of global variables. There are 2 criteria that I could use to see if a variable is global:

read file line by line
if there is an '=' (assignment sign) in the line AND no '#' in beginning of line (it is not commented out), check:
    is there a 'def' string anywhere in the text above this assignment line, e.g. def function_name()?
         if yes, variable within a function and hence not global
         else, possible global variable

How do I do implement this pseudo code using grep, awk, or sed? I am also open to suggestions regarding better strategies of finding global variables with the bash script.

Example code:

int a = 23

def func_name():
    ...body...

This code fails our test.

Example code 2:

def function_name():
    int a = 4
    ...

This code passes the test.

learnerX
  • 1,022
  • 1
  • 18
  • 44
  • Added 2 examples to make the question more specific. – learnerX Dec 07 '17 at 20:52
  • 4
    Using line-oriented tools to analyze Python source code seems extremely misdirected when there are good Python source code analyzers written in Python. – tripleee Dec 07 '17 at 21:07
  • 2
    Possible duplicate of https://stackoverflow.com/questions/33160744/detect-all-global-variables-within-a-python-function – tripleee Dec 07 '17 at 21:08
  • Implementing this pseudocode as stated, is trivial: You need a flip-flop which is initially false, is set to true once you encounter a `def`, and set to false again when you encounter the end of the function. Whenever you see an `=` and the flip-flop is false, you consider it as global variable. The real problem, of course, is already in the Pseudo code: It will occasionally report a correct global, but will also give false positives and miss some globals. – user1934428 Dec 08 '17 at 10:47
  • It is interesting to consider this type of thing without using python, but that's also unwise if you're actually parsing known-safe python code on a system that actually has python installed. If you don't want to run the code (perhaps you found it online?) and/or don't have python installed locally, perhaps this is an interesting premise. See my answer. – Adam Katz Dec 08 '17 at 19:01

1 Answers1

0

I do not think this is a good idea, but here is an literal implementation of your logic:

awk -F= '
  /^[^#]/ && NF > 1 && saw_def { print "not global:", $0; next  }
  NF > 1 { print "possibly global:", $0 }
  { saw_def = /def/ }
' file.py

This tells awk to use = as a field separator on four pattern {action} pairs:

  1. Line starts with non-#, has a 2nd field (thus has a =), and the saw_def variable is set:
    Print the line as not a global assignment; move on to next line (stop pattern matching)
  2. Line otherwise has a =:
    Print the line as possibly a global assignment
  3. (Any line not matched by #1):
    Set saw_def to a boolean representing whether "def" exists on the line

Here is an example that will fail your test:

def function_name():
    int a = 4
    int b = 8

The logic you requested will note that a is not global and that b is possibly global.


This should be better:

awk '
  NF == 0 { next }
  {
    indent = match ($0, /[^ \t]/) - 1  # count of leading whitespace
    has_assign = /^[^#]*=/
  }
  $1 == "def" && !in_def { in_def = indent; next }
  indent > in_def && has_assign { print "not global:", $0; next }
  indent <= in_def { in_def = 0 }
  has_assign { print "possibly global:", $0 }
' file.py

This attempts to parse the python code, taking advantage of the language's mandatory indentation to keep track of when you're in a function definition. The five pattern {action} pairs are:

  1. No fields (blank line):
    Skip to next line
  2. (Match any other line):
    Set indent to the number of leading spaces and tabs before the first
    Set has_assign to whether there is a variable assignment (a = exists before any #)
  3. The first field is "def" and we're not already tracking a function:
    Keep track of the current indentation level using ind_def and stop matching for this line
  4. We're indented enough to be in a function and we have a assignment:
    Report the non-global variable assignment
  5. The indentation level is not more than the last function:
    We're not in a function any more, set the function indentation level back to zero
  6. We have a variable assignment (and no next has been triggered yet):
    Report the possibly-global variable assignment

Note, I'm not sure all versions of awk understand \t so you might need a literal tab in its place. You can enter that on the command line as Ctrl+v, Tab.


⚠ I still do not think a home-made python parser is a good idea. If you really are parsing valid (and known-safe) python code, this similar question has answers that actually load the code, which would of course be definitive. If you're dealing with potentially unsafe code, or python-like code, and/or you lack a python interpreter on the system you're using, my solution might be useful for quick code vetting.

Adam Katz
  • 14,455
  • 5
  • 68
  • 83