How to find header dependencies for large scale projects on linux

Question

I'm working an a very large scale projects, where the compilation time is very long. What tools can I use (preferably open source) on Linux, to find the most heavily included files and that optimize their useages? Just to be clearer, I need a tool which will, given the dependencies, show me which headers are the most included. By the way, we do use distributed compiling

Just to be clearer, I need a tool which will, given the dependencies, show me which headers are the most included. By the way, we do use distributed compiling — user12371, Sep 17 '08 at 07:59
Perhaps you should edit your question to include this information rather than have it as a comment? — Dominik Grabiec, Sep 17 '08 at 08:18

score 4 · Answer 1 · answered Sep 17 '08 at 08:00

4

Check out makdepend

answered Sep 17 '08 at 08:00

INS

10,594
7
58
89

This gives me the dependency for each file. I need someting that given this, will find the most included files. – user12371 Sep 17 '08 at 08:04

score 4 · Answer 2 · edited May 23 '17 at 10:33

4

The answers here will give you tools which track #include dependencies. But there's no mention of optimization and such.

Aside: The book "Large Scale C++ Software Design" should help.

edited May 23 '17 at 10:33

Community

1
1

answered Sep 17 '08 at 08:12

Agnel Kurian

57,975
43
146
217

Dominik Grabiec · Answer 3 · 2008-09-17T08:16:47.743

3

Using the Unix philosophy of "gluing together many small tools" I'd suggest writing a short script that calls gcc with the -M (or -MM) and -MF (OUTFILE) options (As detailed here). That will generate the dependency lists for the make tool, which you can then parse easily (relative to parsing the source files directly) and extract out the required information.

edited Sep 17 '08 at 08:16

answered Sep 17 '08 at 08:09

Dominik Grabiec

10,315
5
39
45

OMG! Where have I been for decades of hacking? Right from the authoritative source. A small amount of quality time with these options (as detailed in Daemin's solid reply) and some routine yet pertinent hackery, and there it is. Thanks Daemin. – davernator Nov 29 '19 at 23:46

score 2 · Answer 4 · answered Sep 17 '08 at 08:01

2

Tools like doxygen (used with the graphviz options) can generate dependency graphs for include files... I don't know if they'd provide enough overview for what you're trying to do, but it could be worth trying.

answered Sep 17 '08 at 08:01

slicedlime

2,142
1
17
16

score 2 · Answer 5 · answered Sep 17 '08 at 08:18

2

From the root level of the source tree and do the following (\t is the tab character):

find . -exec grep '[ \t]*#include[ \t][ \t]*["<][^">][">]' {} ';'
    | sed 's/^[ \t]*#include[ \t][ \t]*["<]//'
    | sed 's/[">].*$//'
    | sort
    | uniq -c
    | sort -r -k1 -n

Line 1 get all the include lines. Line 2 strips off everything before the actual filename. Line 3 strips off the end of the line, leaving only the filename. Line 4 and 5 counts each unique line. Line 6 sorts by line count in reverse order.

answered Sep 17 '08 at 08:18

paxdiablo

854,327
234
1,573
1,953

You need [^">]* rather than [^">] in the grep. – Douglas Leeder Sep 17 '08 at 08:41
This also doesn't track includes that are generated downstream. Parsing the output of "gcc -E -dI" will be a lot better for a more complex project. – Joe Hildebrand Sep 17 '08 at 09:05

score 1 · Answer 6 · 2008-09-17T09:38:37.360

1

If you wish to know which files are included most of all, use this bash command:

find . -name '.cpp' -exec egrep '^[:space:]#include[[:space:]]+["<][[:alpha:][:digit:]_.]+[">]' {} \;

| sort | uniq -c | sort -k 1rn,1
| head -20

It will display top 20 files ranked by amount of times they were included.

Explanation: The 1st line finds all *.cpp files and extract lines with "#include" directive from it. The 2nd line calculates how many times each file was included and the 3rd line takes 20 mostly included files.

edited Sep 17 '08 at 09:38

answered Sep 17 '08 at 08:09

Haven't checked this out, but your solution won't work if the same file is included using two different paths. I.e. #include and #include <./dev/blah.h> will be considered different include files. – Dominik Grabiec Sep 17 '08 at 08:13
Basically a sound idea though. – Jonathan Sep 19 '08 at 13:21

Joe Hildebrand · Answer 7 · 2008-09-17T09:03:19.630

1

Use ccache. It will hash the inputs to a compilation, and cache the results, which will drastically increase the speed of these sorts of compiles.

If you wanted to detect the multiple includes, so that you could remove them, you could use makedepend as Iulian Șerbănoiu suggests:

makedepend -m *.c  -f - > /dev/null

will give a warning for each multiple include.

edited Sep 17 '08 at 09:03

answered Sep 17 '08 at 08:18

Joe Hildebrand

10,354
2
38
48

score 1 · Answer 8 · answered Sep 17 '08 at 09:11

Bash scripts found in the page aren't good solution. It works only on simple project. In fact, in large project, like discribe in header page, C-preprocessor (#if, #else, ...) are often used. Only good software more complex, like makedepend or scons can give good informations. gcc -E can help, but, on large project, its result analysis is a wasting time.

score 0 · Answer 9 · answered Sep 17 '08 at 07:53

0

IIRC gcc could create dependency files.

answered Sep 17 '08 at 07:53

EricSchaefer

25,272
21
67
103

score 0 · Answer 10 · answered Sep 17 '08 at 07:56

0

You might want to look at distributed compiling, see for example distcc

answered Sep 17 '08 at 07:56

Toni Ruža

7,462
2
28
31

score 0 · Answer 11 · answered Sep 17 '08 at 08:06

This is not exactly what you are searchng for, and it might not be easy to setup, but may be you could have a look at lxr : lxr.linux.no is a browseable kernel tree.

In the search box, if you enter a filename, it will give you where it is included. But this is still guessing, and it does not track chained dependencies.

Maybe

strace -e trace=open -o outfile make
grep 'some handy regex to match header'

How to find header dependencies for large scale projects on linux

11 Answers11