This isn't possible in a general sense because Python functions are not regular expressions -- they can take on forms that can't be captured with regular expression syntax, especially in the context of OTHER Python structures. But take heart, there is a lot to learn from your question!
The fascinating thing is that although you said you're trying to learn regular expressions, you accidentally stumbled into the very heart of computer science itself, compiler theory!
I'm going to address less than a fraction of the tip of the iceberg in this post to help get you started, and then suggest a few free and paid resources to help you continue.
A python function by itself may take on several forms:
def foo(x):
"docstring"
<body>
def foo1(x):
"""doc
string"""
<body>
def foo2(x):
<body>
Additionally, what comes before and after the function may not be another function!
This is what would make using a regex by itself impossible to autogenerate documentation (well, not possible for me. I'm not smart enough to write a single regular expression than can account for the entire Python language!).
What you need to look into is parsing (by the way, I'm using the term parsing very loosely to cover parsing, tokenizing, and lexing just to keep things "simple") Using regular expressions are typically a very important part of parsing.
The general strategy would be to parse the file into syntactic constructs. Identify which of those constructs are functions. Isolate the text of that function. THEN you can use a regular expression to parse the text of that construct. OR you can parse one level further and break up the function into distinct syntactic constructions -- function name, parameter declaration, doc string, body, etc... at which point your problem will be solved.
I was attempting to write a regular expression for a standard function definition (without parsing) like foo
or foo1
but I was struggling to do-so even having written a few languages.
So just to be clear, the point that I would think about parsing as opposed to simple regex is any time your input spans multiple lines. Regex is most effective on single lines.
A parsing function looks like this:
def parse_fn_definition(definition):
def parse_name(lines):
<code>
def parse_args(lines):
<code>
def parse_doc(lines):
<code>
def parse_body(lines):
<code>
...
Now here's the real trick: Each of these parse
functions returns two things:
0) The chunk of parsed regex
1) The REST of line
so for instance,
def parse_name(lines):
pattern = 'def\s*?([a-zA-Z_][a-zA-Z0-9_]*?)'
for line in lines:
m = re.match(pattern, line)
if m:
res, rest = m.groups()
return res, [rest] + lines
else:
raise Exception("Line cannot be parsed by parse_name: {}".format(line))
So, once you've isolated the function text (that's a whole other set of tricks to do, usually involves creating something called a "grammar" -- don't worry, I set you up with some resources down below), you can parse the function text with the following technique:
def parse_fn(lines_of_text):
name, rest = parse_name(lines_of_text)
params, rest = parse_params(rest)
doc_string, rest = parse_doc(rest)
body, rest = parse_body(rest)
function = [name, params, doc_string, body]
res = function, rest
return res
This function would return some data structure that represents the function (I just used a simple list
for illustration) and the rest of the lines of text. That would get passed on to something that will appropriately catalog your function
data and then classify and process the rest
of the text!
Anyway, if this is something that interests you, don't give up! I would offer a few humble suggestions:
1) Start with an EASIER language to parse, like Scheme/LISP. These languages were designed to be easy to parse and manipulate! Then work your way up to more irregular languages.
2a) Peter Norvig has done some amazing and very accessible work on this. Check out Lispy!
2b) Peter Norvig's class CS212 (specifically unit 3 code) is very challenging but does an excellent job introducing fundamental language design concepts. Every job I've ever gotten, and my love for programming, is because of that course.
3) If you want to advance yourself even further and you can afford it, I would strongly recommend checking out Dave Beazely's workshops on compilers or interpreters. I've taken two courses from Dave, and while I can't promise this for everyone, my salary has literally doubled after each course, so I think it's a worthwhile investment.
4) Absolutely check out Structure and Interpretation of Computer Programs (the wizard book) and Compilers (the dragon book). They'll change your life.
5) DON'T GIVE UP! YOU GOT THIS!! Good luck to you!