I am working on making a dependency diagram of all scripts and their dependencies on any machine using python and Graphviz. I recently completed connecting python dependencies to their applicable scripts and generating the diagram.
Here's a super zoomed out image of what it's looking like so far:
Green - Non-Library Script; Nothing depends on it.
Blue - Library Script
Crimson - Non-Library Script; At least 1 depends on it.
So now it's time to move on to a new language and do the same. I decided to start with AWK.
TL;DR entry point:
How can you determine what script dependencies are used in AWK?
Even a starting place would be amazing, as I have never worked with AWK before. I already have some script to parse out paths and extensions. All I need is the dependency name pulled from the script using it so that I can set up some key:value pairs.
Example data would be something like:
awk_dict = {
'awk_script_1': ['dependency1', 'dependency2', '...'],
'awk_script_2': ['dependency1', 'dependency2', '...'],
...
'awk_script_n': ['dependency1', 'dependency2', '...']
}
Edit: It was requested to show how I am parsing out python scripts.
def main():
""" Generate a diagram of the server's scripts and their relations. """
directory_of_execs = get_executables()
generate_graphviz_diagram(parse_script(directory_of_execs))
This will work for all executables. I later sift out only the ones that were called for in the arguments. It actually works pretty fast, so I've been holding off on moving the sifting methods here.
def get_executables():
"""
Generate a list of all active executable's location on the server.
This will only grab the executables that currently have executable rights.
If you do not have sudo access, some executables may be missed -
as your permission to the directory and it's contents will be denied.
Returns:
A list containing every accessible, active executable's location that is
not an OS file.
"""
# Allowed executable file extensions
allowed_executables = ('.awk', '.c', '.csh', '.inc', '.ln', '.orig', '.pl',
'.pm', '.save', '.sh', '.template', '.py')
applicable_dirs = ('/home/', '/Rusr/', '/usr/')
exec_info = []
# Directories listed here are mostly system files or non-important generated files.
# These have been decided to be ignored. This list is incomplete.
black_listed = ['redhat', 'RHEL', 'local', 'kernel?', 'lib*', 'python*', '__*__', '*system*']
for cleared_dirs in applicable_dirs:
for path, dirs, files in os.walk(cleared_dirs, topdown=True, followlinks=False):
# Modify current applicable directories in-place with black_listed filters.
dirs[:] = [
d for d in list(dirs) if not any(fnmatch.fnmatch(d, pattern)
for pattern in black_listed) if not d.startswith('.')
]
# Final sift of parsing out desired executables only.
for executable in files:
if executable.endswith(allowed_executables):
exec_info.append((path, executable))
return exec_info
This is where I actually pull out the import modules' information from each script and sift out any that cannot compile.
def parse_script(executables):
"""
Retrieve parameters from the entered script.
Attributes:
script_path: A string of the path to a script.
module_str: A string of only the scripts name and extension.
Returns:
script_values: A dictionary of key-values for making a diagram.
"""
module_container = dict()
error_scripts = [] # Scripts that cannot dissassemble due to errors within.
called_scripts = [] # Whitelisted script extensions to add to diagram.
## NOTE: There will be more of these soon. Only python is supported right now.
if ARGS.python:
called_scripts.append('.py')
for script_path, module_str in executables:
# Build a dictionary with the script's info.
script_values = dict()
script_values['name'] = module_str[:module_str.rfind('.')].replace('"', '')
script_values['extension'] = module_str[module_str.rfind('.'):]
script_values['path'] = f'{script_path}/{script_values["name"]}{script_values["extension"]}'
# Dissassemble the script and compile.
if script_values['extension'] in called_scripts:
with open(script_values['path']) as file_pointer:
# Concatenate the script.
statements = file_pointer.read()
try:
# Use dis to pull information on individual scripts.
cat_mod = dis.get_instructions(statements)
except Exception as error:
# If there is a error in the program here it is not caused by
# this script but the script that is being dissassembled.
# Log the bad script and the error it pops.
error_scripts.append(f'SCRIPT :: {script_values["name"]}\nPATH '
f':: {script_values["path"]}\n\t'
f'ERROR INFO ::\n\t{error}')
else:
# Sift for information only on imports.
imports = [module for module in cat_mod if 'IMPORT' in module.opname]
grouped = defaultdict(list)
for imp in imports:
grouped[imp.opname].append(imp.argval)
script_values['imports'] = grouped
# Check for script in module_container
if script_values['name'] not in module_container:
script_values['imports'] = grouped
module_container[script_values['name']] = script_values
return module_container
I am expecting I will have to make a unique function to parse out dependency information for each language. I would like to create some super awesome function that could parse all the languages, but that seems a little out of reach and pylint would probably tell me my function is too big. :(