I want to extract all Python functions/methods with their signatures from a Python project. I've tried:
$ grep -r ^def *
but this doesn't show full signatures when parameters span several lines. Any suggestions?
I want to extract all Python functions/methods with their signatures from a Python project. I've tried:
$ grep -r ^def *
but this doesn't show full signatures when parameters span several lines. Any suggestions?
You can tokenize the file and use that to print function definitions:
import token
from tokenize import generate_tokens
def find_definitions(filename):
with open(filename) as f:
gen = generate_tokens(f.readline)
for tok in gen:
if tok[0] == token.NAME and tok[1] == 'def':
# function definition, read until next colon.
definition, last_line = [tok[-1]], tok[3][0]
while not (tok[0] == token.OP and tok[1] == ':'):
if last_line != tok[3][0]:
# more than one line, append, track line number
definition.append(tok[-1])
last_line = tok[3][0]
tok = next(gen)
if last_line != tok[3][0]:
definition.append(tok[-1])
yield ''.join(definition)
This works regardless of how many lines a function definition uses.
Demo:
>>> import textwrap
>>> gen = find_definitions(textwrap.__file__.rstrip('c'))
>>> for definition in gen:
... print(definition.rstrip())
...
def __init__(self,
width=70,
initial_indent="",
subsequent_indent="",
expand_tabs=True,
replace_whitespace=True,
fix_sentence_endings=False,
break_long_words=True,
drop_whitespace=True,
break_on_hyphens=True):
def _munge_whitespace(self, text):
def _split(self, text):
def _fix_sentence_endings(self, chunks):
def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
def _wrap_chunks(self, chunks):
def wrap(self, text):
def fill(self, text):
def wrap(text, width=70, **kwargs):
def fill(text, width=70, **kwargs):
def dedent(text):
The above uses the textwrap
module to demonstrate how it can handle multi-line definitions.
If you need to support Python 3 code with annotations, you'll need to be a little bit cleverer and track open and closing parens too; a colon within the parentheses doesn't count. On the other hand, Python 3 tokenize.tokenize()
produces named tuples which make the function below a little easier to read:
import token
from tokenize import tokenize
def find_definitions(filename):
with open(filename, 'rb') as f:
gen = tokenize(f.readline)
for tok in gen:
if tok.type == token.NAME and tok.string == 'def':
# function definition, read until next colon outside
# parentheses.
definition, last_line = [tok.line], tok.end[0]
parens = 0
while tok.exact_type != token.COLON or parens > 0:
if last_line != tok.end[0]:
definition.append(tok.line)
last_line = tok.end[0]
if tok.exact_type == token.LPAR:
parens += 1
elif tok.exact_type == token.RPAR:
parens -= 1
tok = next(gen)
if last_line != tok.end[0]:
definition.append(tok.line)
yield ''.join(definition)
In Python 3 you'd preferably open source files in binary mode and let the tokenizer figure out the right encoding. Also, the above Python 3 version can tokenize Python 2 code without issue.
You can parse the source using the ast
module. It allows you to see exactly the same code structure the interpreter sees. You just need to traverse it and dump out any function definitions you find.
If you want to handle edge cases like multi-line declarations, bash/grep is not enough.
This isn't a place to use regex in my opinion, unless you accept the fact that you'll potentially miss many edge cases.
Instead I'd suggest you use inspect
and funcsigs
(funcsigs
is a backport of changes made in Python 3's inspect
module. It includes the signature parsing functions).
Here's the file we'll parse (inspect_me.py
):
import sys
def my_func(a, b=None):
print a, b
def another_func(c):
"""
doc comment
"""
return c + 1
And here's the code that will parse it for us:
import inspect
from funcsigs import signature
import inspect_me
if __name__ == "__main__":
# get all the "members" of our module:
members = inspect.getmembers(inspect_me)
for k, v in members:
# we're only interested in functions for now (classes, vars, etc... may come later in a very similar fashion):
if inspect.isfunction(v):
# the name of our function:
print k
# the function signature as a string
sig = signature(v)
print str(sig)
# let's strip out the doc string too:
if inspect.getdoc(v):
print "FOUND DOC COMMENT: %s" % (inspect.getdoc(v))
Inspect is the way
to go about introspection in python. token
and ast
could both do the job but they're much more low level/complex than what you actually need here.
The output of running the above:
another_func
(c)
FOUND DOC COMMENT: doc comment
my_func
(a, b=None)