Your question is Source Code Modularization in Software Engineering. It is new subject in software and there are few references about it. Source Code Modularization is recasting of Clustering concepts on Source Codes.
in this reference from (see reference 1)
The aim of the software modularization process is to partition a
software system into subsystems to provide an abstract view of the
architecture of the software system, where a subsystem is made up of a set of software artifacts which collaborate with each other to
implement a high-level attribute or provide a high-level service for
the rest of the software system.
However, for large and complex software systems, the software
modularization cannot be done manually, owing to the large
number of interactions between different artifacts, and the large size
of the source code. Hence, a fully automated or semiautomated tool is
needed to perform software modularization.
There are many techniques (Algorithms) to Source Code Modularization (see reference 1):
Hierarchical Techniques:
- Single Linkage, Complete Linkage, Average Linkage
- Ward Method, Median Method, Centroid Method
- Combined and Weighted Combined Methods
Search-Based Techniques:
- Hill Climbing, Multiple Hill Climbing (HC)
- Simulated Annealing (SA)
- Genetic Algorithm (GA)
Notice that you can find other Clustering techniques with this names too. But Modularization is a little different. They are recast to source code modularization.
The overall Source Code Modularization Process shown as below:

There are many tools you can use. You can use them in Modularization Process:
- Static Source Code Analysis Tools (to get ADG format and etc.) see the reference here - (like Understand, NDepend and etc.)
- Visualization Tools - (Graph Visualization) see the list here (like Tom Sawyer Visualization)
For example of little project, If your project structure (that generated from source code by use of Static Analysis Tools) are like this:

the result can be like this (after applying Modularization Process):
