3

We are using napolean style docstring for python modules. But there is a need to auto populate additional attributes in the docstring called Data Owner and DAL Owner so that the given function looks like this:

def func(self, arg1=None, arg2=None):
"""
Returns the timeseries for the specified arg1 and arg2.
Args:
    arg1: argument 1
    arg2: argument 2
Returns:
    DataFrame containing timeseries of arg1 for arg2.

DAL Owner: Team IT
Data Owner: Team A
"""

These additional attributes and their values for a given function are provided in a separate csv file. The way I was thinking was to have a script (awk, sed?) that will

  • extract all the function names in a given python file. Can easily do it in python
  • for those function names, check if the owners exist in the csv file and if so create a mapping of the function name and owners. Doable

Now, this is the part which I havent figured out and dont know the best way forward. For a given function name and owners, I need to go back into the python file and add the owners to the docstring if it exists. I am thinking some sort of awk script but not quite sure

  • Find the function that matches the pattern
  • For that pattern, see if doctsring exists, triple quotation marks after closing parenthesis
  • If docstring exists, add additional two lines for the owners before the closing triple quotation
  • If docstring does not exists, then insert the two lines for owners between tripe quotations on the line after function declaration.

I know this is a lot of steps but can anyone provide insight with the previous 4 bullet points to insert the additional attributes to docstring given the function, attributes and the python file. Will a linux utility like sed, awk be more useful or should I go the python route. Is there some other option that's easier to implement.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
Fizi
  • 1,749
  • 4
  • 29
  • 55
  • 1
    You could use the ast package to parse the source, insert/amend docstrings and rewrite the code. However this approach could cause large diffs in your version control system, unless you auto-format your code. – snakecharmerb Dec 01 '18 at 09:46
  • I am ok with diffs in version control as long as the code remains the same incuding comments. From what I have read online ast ignores comments. Moreover while I have been able to visit the nodes and get the docstring, I still havent been able to assign that docstring back to the function. Could you please tell me how I can assign a docstring to a function in AST – Fizi Dec 01 '18 at 19:26
  • https://www.sphinx-doc.org/en/master/ looks like a better alternative – alper Apr 28 '23 at 13:15

1 Answers1

3

The process for assigning a new docstring in an ast is:

  1. Get the existing docstring using ast.get_docstring
  2. Create a new ast node with amended content
  3. If the existing dostring is None, insert the new node at the start of the parent node's body
  4. If there was an existing docstring, replace it's node with the new node
  5. Use the unparse* tool from Cpython Tools to generate the new source (you may need to download this from github - ensure you get the version that matches your python version)

Here's some example code:

$  cat fixdocstrings.py                            
import ast                                                                               
import io
from unparse import Unparser


class DocstringWriter(ast.NodeTransformer):
    def visit_FunctionDef(self, node):
        docstring = ast.get_docstring(node)
        new_docstring_node = make_docstring_node(docstring)
        if docstring:
            # Assumes the existing docstring is the first node 
            # in the function body.
            node.body[0] = new_docstring_node
        else:
            node.body.insert(0, new_docstring_node)
        return node


def make_docstring_node(docstring):
    if docstring is None:
        content = "A new docstring"
    else:
        content = docstring + " -- amended"
    s = ast.Str(content)
    return ast.Expr(value=s)


if __name__ == "__main__":
    tree = ast.parse(open("docstringtest.py").read())
    transformer = DocstringWriter()
    new_tree = transformer.visit(tree)
    ast.fix_missing_locations(new_tree)
    buf = io.StringIO()
    Unparser(new_tree, buf)
    buf.seek(0)
    print(buf.read())

$  cat docstringtest.py 
def foo():
    pass


def bar():
    """A docstring."""

$  python fixdocstrings.py 


def foo():
    'A new docstring'
    pass

def bar():
    'A docstring. -- amended'

(I answered something similar for myself for python2.7, here)

* As of Python 3.9, the ast module provides an unparse function that can be used instead of the unparse tool: src = ast.unparse(new_tree)

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • thank you. just one question -- is it always guaranteed that docstring is the first node in the function body. – Fizi Dec 02 '18 at 21:49
  • 1
    @Fizi a docstring is defined as the first expression in a class module or function, so yes. ast.get_docstring looks for the first node and if a "docstring" appears later in an object's code it won't be assigned to `__doc__`. – snakecharmerb Dec 03 '18 at 06:50
  • I have run int another major problem, which is that the comments aren't preserved.. Anyway to do that? – Fizi Dec 03 '18 at 22:41
  • I had to move away from AST and use regex instead – Fizi Dec 06 '18 at 17:31
  • There's no way to preserve comments using the core ast module, but something like [red baron](https://pypi.org/project/redbaron/) might work (I've never used it). Alternatively you could either used sed to wrap your comments in quotes then unwrap after docstring processing, or [delete the comment parts from the diff after processing](https://kennyballou.com/blog/2015/10/art-manually-edit-hunks/). Though I've not tried either of these approaches. – snakecharmerb Dec 07 '18 at 13:02
  • Yeah I made a home brewed solution using regex substitution. It seems to do the job. Thanks for all your help. – Fizi Dec 07 '18 at 13:24