3

For a git alias problem, I'd like to be able to select a single Python function from a file, by name. eg:

  ...
  def notyet():
      wait for it

  def ok_start(x):
      stuff
      stuff
      def dontgettrickednow():
         keep going
  #stuff
      more stuff

  def ok_stop_now():

In algorithmic terms, the following would be close enough:

  1. Start filtering when you find a line that matches /^(\s*)def $1[^a-zA-Z0-9]/
  2. Keep matching until you find a line that is not ^\s*# or ^/\1\s] (that is, either a possibly-indented comment, or an indent longer than the previous one)

(I don't really care if decorators before the following function are picked up. The result is for human reading.)

I was trying to do this with Awk (which I barely know) but it's a bit harder than I thought. For starters, I'd need a way of storing the length of the indent before the original def.

Community
  • 1
  • 1
Steve Bennett
  • 114,604
  • 39
  • 168
  • 219

2 Answers2

4

Why not just let python do it? I think the inspection module can print out the source of a function, so you could just import the module, select the function and inspect it. Hang on. Banging away at a solution for you...

OK. It turns out the inspect.getsource function doesn't work for stuff defined interactively:

>>> def test(f):
...     print 'arg:', f
...
>>> test(1)
arg: 1
>>> inspect.getsource(test)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\inspect.py", line 699, in getsource
    lines, lnum = getsourcelines(object)
  File "C:\Python27\lib\inspect.py", line 688, in getsourcelines
    lines, lnum = findsource(object)
  File "C:\Python27\lib\inspect.py", line 529, in findsource
    raise IOError('source code not available')
IOError: source code not available
>>>

But for your use case, it will work: For modules that are saved to disk. Take for instance my test.py file:

def test(f):
    print 'arg:', f

def other(f):
    print 'other:', f

And compare with this interactive session:

>>> import inspect
>>> import test
>>> inspect.getsource(test.test)
"def test(f):\n    print 'arg:', f\n"
>>> inspect.getsource(test.other)
"def other(f):\n    print 'other:', f\n"
>>>

So... You need to write a simple python script that accepts the name of a python source file and a function/object name as arguments. It should then import the module and inspect the function and print that to STDOUT.

Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
  • I like your thinking, but it looks like Python would have to be able to parse the whole project - which would mean checking everything out, not just a single file. (Otherwise it can't import, because the dependencies are broken) – Steve Bennett May 09 '12 at 08:48
  • oh dear. well, in that case, there is also the python compiler package can help: http://docs.python.org/library/compiler.html - but this will be a lot more work! – Daren Thomas May 09 '12 at 08:51
4

One way using awk. Code is well commented, so I hope it's easy to understand.

Content of infile:

  ...
  def notyet():
      wait for it

  def ok_start(x):
      stuff
      stuff
      def dontgettrickednow():
         keep going
  #stuff
      more stuff

  def ok_stop_now():

Content of script.awk:

BEGIN {
        ## 'f' variable is the function to search, set a regexp with it.
        f_regex = "^" f "[^a-zA-Z0-9]"

        ## When set, print line. Otherwise omit line.
        ## It is set when found the function searched.
        ## It is unset when found any character different from '#' with less
        ## spaces before it.
        in_func = 0
}

## Found function.
$1 == "def" && $2 ~ f_regex {

        ## Get position of first 'd' in the line.
        i = index( $0, "d" )

        ## Sanity check. Never should success because the condition was
        ## checked before.
        if ( i == 0 ) {
                next
        }

        ## Get characters until matched index before, check that all of
        ## them are spaces, and get its length.
        indent = substr( $0, 0, i - 1 )
        if ( indent ~ /^[[:space:]]*$/ ) {
                num_spaces = length( indent )
        }

        ## Set variable, print line and read next one.
        in_func = 1
        print
        next
}

## When we are inside the function, line doesn't begin with '#' and
## it's not a blank line (only spaces).
in_func == 1 && $1 ~ /^[^#]/ && $0 ~ /[^[:space:]]/ {

        ## Get how many characters there are until first non-space. The result
        ## is the position of first non-blank, so substract one to get the number
        ## of spaces.
        spaces = match( $0, /[^[:space:]]/ )
        spaces -= 1

        ## If current indent is less or equal that the indent of function definition, then
        ## end of function found, so end processing.
        if ( spaces <= num_spaces ) {
                in_func = 0
        }
}

## Self-explanatory.
in_func == 1 { 
        print
}

Run it like:

awk -f script.awk -v f="ok_start" infile

With following output:

  def ok_start(x):
      stuff
      stuff
      def dontgettrickednow():
         keep going
  #stuff
      more stuff
Birei
  • 35,723
  • 2
  • 77
  • 82
  • Wow, awesome :) I can confirm it works on my real world functions. An interesting aspect is if there are two functions with the same name (can happen if they're in different classes), it returns both of them, end to end. Not sure if that was intended (or indeed what the correct behaviour should be) - seems like a good result. – Steve Bennett May 09 '12 at 10:31
  • @SteveBennett: I was just editing to modify that behaviour. Now it should only process first function found. – Birei May 09 '12 at 10:34
  • I kind of like the "select all functions with the same name" version actually - otherwise I'm not sure how you'd get access to a second func with the same name. – Steve Bennett May 11 '12 at 00:29
  • @SteveBennett: Ok. Fixed to old behavior. – Birei May 11 '12 at 07:13