0

Please leave a comment if this explanation is long-winded or if I'm not being clear; the ultimate goal of my project is to create a directed acyclic graph using networkx such that an arrow is drawn from every caller function to any callee function in it's body (something similar to what's described in this post, but for python, rather than C#.) My project aim isn't to view real-time function calls as a graph, but to see a static, connections between all the projects on a particular server.

Essentially, I'm trying to explore the structure of a codebase I inherited. On the server I'm where all the source code is stored, there are many unrelated projects spread out across the filesystem.

In this example filesystem,

/
├── My_Graph_Script/
│   └── digraph.py
│
├── Project_1/
│   ├── A.py
│   └── B.py
│
├── Project_2/
│   ├── C.py
│   └── module_2/
│       ├── D.py
│       └── E.py
│
└── Some_Directory/
    └── Project_3/
        ├── F.py
        ├── G.py
        └── module_3/
            ├── H.py
            └── I.py

I might want to see the caller-callee pairs between

  • A.py and B.py
  • C.py, D.py, and E.py
  • F.py, G.py,H.py, and I.py

More specifically, I would like to produce a nested dictionary of strings (and lists of strings) with the following structure shown below. I'll be using this as input to construct graphs in networkx. (the example is for A.py and B.py.)

function_call_dict = {
    A.py : {
        function_name_1 : [list,functions,called,in,body_1,&,defined,in,A,or,B],
        function_name_2 : [list,functions,called,in,body_2,&,defined,in,A,or,B]
        }
    B.py : {
        function_name_3 : [list,functions,called,in,body_3,&,defined,in,A,or,B],
        function_name_4 : [list,functions,called,in,body_4,&,defined,in,A,or,B]
        }
}

However, before I can construct such a dictionary, I have to be able to access function bodies and definitions from python files that aren't modules in the My_Graph_Script project directory.

My initial thoughts were to apply the approach seen here where one could import many different modules as part of a for loop, only I would walk the directory tree with os.walk(root_path) to import all the necessary modules. After that, I could use inspect to access python functions as objects, as suggested here.

Since any script that uses inspect has to first import the module to access any of it's functions, is it possible to import modules that are in completely different project folders or nested somewhere deep in the file system, possibly without an __init__.py file?

Also, if my approach is totally wrongheaded, or if there are already developer tools create directed acyclic graphs from python code, I'd love to know of them.

David
  • 606
  • 9
  • 19

1 Answers1

1

[...], or if there are already developer tools create directed acyclic graphs from python code, I'd love to know of them.

There are a couple of options to determine the dependency graph, for example snakefood and findimports. IIRC, snakefood just parses the text and does not load the module so it should work even in the absence of __init__.py files.

IIRC, they both export to graphviz' dot file format, so you can render them using graphviz or import them into networkx and use its functionalities to plot the output. If the networkx (py-)graphviz layout does not produce good enough results (e.g. due to node label overlaps), there is grandalf, which implements the neato (aka graphviz) layout specifically with your use case in mind.

Paul Brodersen
  • 11,221
  • 21
  • 38
  • Thanks! These tools look really great. Although they don't quite meet the needs of my project, so instead I'm going to take it in a completely different direction; I'm going to rely on the [ast](https://docs.python.org/3/library/ast.html) module, for which is best documented by [green tree snakes](https://greentreesnakes.readthedocs.io/en/latest/tofrom.html). – David Jul 10 '19 at 16:55