0

I have been given a medium-sized but complex C project (about 200,000 lines in total) which contains around 100 .h files and nearly as many .c files.

Many of the .h files correspond to equivalent .c files, but there is one .h file in particular, let's call it project_common.h, that #includes many of the other .h files as well as containing about 2000 lines of mostly struct and enum definitions etc. Many of the structs are heavily nested so that their order very much matters.

The structure of the file is roughly:

#include guard

#include <assert.h>
#include <stdint.h>
/* etc */



#include "project_aaa.h"
#include "project_bbb.h"
/* Then another 30 or so lines like this. These are in alphabetical
order and a certain amount of effort has been made that they can be
included in any order. */


/* Then about 2000 lines of struct, enums, function definitions etc. */

I have been tasked with moving most or all of the 2000 lines and either creating new .h files for them or pasting them into one of the existing .h files. One rule is that each header must be able to be included independently of all others. In other words, each header must not need other headers to be included before it. After some effort, I've quickly realised that, even as a senior software engineer of 25+ years' experience, this is not an easy task at all because of the very complex and hierarchical nature of the struct definitions.

My problems in particular are:

  1. It's bad practice to include all those headers in project_common.h, and defeats the point of splitting it all up.
  2. It's really REALLY hard to split all this up in such a way that the resulting headers can be included from a given C file in any combination.

So what I am asking is, are there any tools out there that can help with refactoring all the .h files into a more optimal configuration, and/or is there a recognised method that's better than trial and error?

So far I've tried moving struct definitions around, but progress is very slow and tedious, and although I have nearly halved the size of project_common.h, the new headers I have created only work if they are included in the right order.

Wilseus
  • 1
  • 2
  • See [Should I use `#include` in headers?](https://stackoverflow.com/q/1804486/15168) for a discussion of how headers should be organized. Each header can (and should) be made self-contained, idempotent and minimal. All three attributes are important. – Jonathan Leffler Jul 28 '23 at 03:47
  • Yep, I get this, but it's **how** to apply this to an existing codebase that I am struggling with, not what it ultimately needs to look like. – Wilseus Jul 28 '23 at 12:36

1 Answers1

2

I have been tasked with moving most or all of the 2000 lines and either creating new .h files for them or pasting them into one of the existing .h files.

I don't know about refactoring tools for this purpose, but as a salient matter of code style, I hold that every header and regular source file, X, should itself #include every other header that directly declares any function or variable that X defines or directly references, and every header defining a macro that X directly uses, and only those. That applies to #includeing system and third-party headers just as much as to your project's internal headers.

You may have already come a long way in that direction in support of the goal of making it possible to #include any header individually. However, it seems clear that you cannot be fully adhering to that principle when you say

the new headers I have created only work if they are included in the right order.

If each header #includes all the other headers providing declarations that it needs itself, and also provides effective guards against multiple inclusion, then the only way to have #include-order problems is if you have a dependency cycle. If you did not already have a cycle when you started then there is no particular reason why your refactoring should produce one. If your refactoring does produce one then that implies that some or all of the contents of the headers in that cycle should be merged into the same header.

Also, and this may be obvious, in choosing where to move the existing declarations, I would recommend focusing on semantic relationships rather than on simple code dependency relationships. Things that are usually used together are a likely choice for cohabitation in the same header, but not so much things that just happen to be in some of the same dependency chains.

Now, I suppose it's possible that when you say ...

One rule is that each header must be able to be included independently of all others.

... you mean not just that one can pick and choose headers to include without concern about order and dependencies, but also that no header is permitted to include any of the others. If so, then that's an artificial and difficult to sustain provision. It implies, for instance, that wherever you have a structure or union type that embeds (not just points to) an object of one of the project's other internal types, those two types must be declared in the same header. If you happen to be saddled with something like that, then your current task provides a good context for pushing back against it.

Finally, as a practical matter, I would start at the top of the file and work downward from there. This way you will work first with the declarations that have the fewest dependencies. You may even find it useful to think of this and work on it as a series of many small refactorings instead of one huge one.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • I agree with everything you have said, but just for clarification, I haven't adhered to the principle I stipulated simply because that is as far as I have got. Secondly, I did not mean that no header could include any others, just that I want to avoid one header including **all** others. I realise there needs to be a hierarchy, but due to the complexity of the code, some structs are members of other structs many deep. FWIW, the code is an open source driver for a very complex piece of hardware. – Wilseus Jul 28 '23 at 12:25