Unusual usage of .h file in C

Question

During reading article about filtering, I've found some strange using of .h file - use it for filling array of coefficients:

#define N 100 // filter order
float h[N] = { #include "f1.h" }; //insert coefficients of filter
float x[N];
float y[N];

short my_FIR(short sample_data)
{
  float result = 0;

  for ( int i = N - 2 ; i >= 0 ; i-- )
  {
    x[i + 1] = x[i];
    y[i + 1] = y[i];
  }

  x[0] = (float)sample_data;

  for (int k = 0; k < N; k++)
  {
    result = result + x[k]*h[k];
  }
  y[0] = result;

  return ((short)result);
}

So, is it normal practice to use float h[N] = { #include "f1.h" }; this way?

It's uncommon. Which is to say it's used, rarely: `f1.h` might be generated by an external tool and used as an input to your (compiled) program. It's more common that the external tools generate a full blown header file though, e.g. `float h[N] = { ... }` would be inside `f1.h`. [Example here](http://stackoverflow.com/a/25709500/242520). — ta.speot.is, Nov 10 '14 at 12:15
It should not compile, see the syntax in C11 6.10.2. `# include "q-char-sequence" new-line`. Where's the new line? — Lundin, Nov 10 '14 at 12:21
So... how do all these people saying this is valid code manage to get it to compile? I can't manage using gcc, I get `error: "stray #"` no matter which standard or options I pass. — Lundin, Nov 10 '14 at 12:44
@Lundin: Some compilers allow preprocessor bits anywhere, and in the rest you can simply insert newlines. Doesn't break this particular example. — Mooing Duck, Nov 10 '14 at 22:34
There's a semi-famous example of this in the Quake source code: http://stackoverflow.com/questions/17770571/what-could-be-the-purpose-of-a-header-file-with-raw-data — GoBusto, Nov 11 '14 at 12:36
@user657267 I think I might actually try it since I see no harm in doing it this way, :P. — John Odom, Nov 11 '14 at 14:40
Filter coefficients are often generated by some external tool, so this type of structure actually isn't that uncommon in DSP applications. However, as @ta.speot.is indicated, I would make the external tool generate the entire delcaration (with the `float h[N] ...` included). Then, if you decide to change the number of filter coefficients in the future, the definition for `N` automatically stays in sync with the filter coefficient array, making your code less brittle. Currently, if you change the number of coefficients in `f1.h`, you need to make sure to update `N` in your source file also. — Jason R, Nov 11 '14 at 16:04
@user657267: It __is__ common practice for __seldom__ cases. I had about three or four occasions in my career when this saved my day. As Jason R wrote, coefficients are thankfully included this way (sans the syntax error in the example). Or generally, imported data you want to have inside your application. What counts regarding code is readability, and having exotic parsers or grammars just to produce a neat C or C++ file each time your data changes would be far less readable. — Sebastian Mach, Nov 12 '14 at 11:48

score 134 · Accepted Answer · edited Nov 11 '14 at 14:09

Preprocessor directives like #include are just doing some textual substitution (see the documentation of GNU cpp inside GCC). It can occur at any place (outside of comments and string literals).

However, a #include should have its # as the first non-blank character of its line. So you'll code

float h[N] = {
  #include "f1.h"
};

^{The original question did not have #include on its own line, so had wrong code.}

It is not normal practice, but it is permitted practice. In that case, I would suggest using some other extension than .h e.g. use #include "f1.def" or #include "f1.data" ...

Ask your compiler to show you the preprocessed form. With GCC compile with gcc -C -E -Wall yoursource.c > yoursource.i and look with an editor or a pager into the generated yoursource.i

I actually prefer to have such data in its own source file. So I would instead suggest to generate a self-contained h-data.c file using e.g. some tool like GNU awk (so file h-data.c would start with const float h[345] = { and end with };...) And if it is a constant data, better declare it const float h[] (so it could sit in read-only segment like .rodata on Linux). Also, if the embedded data is big, the compiler might take time to (uselessly) optimize it (then you could compile your h-data.c quickly without optimizations).

why redirect the output and not just use gcc -C -E -o yoursource.i -Wall yoursource.c? There option is there for a reason! — dave, Nov 11 '14 at 09:59
Because `>` is one character shorter than `-o` and because I am lame. — Basile Starynkevitch, Nov 11 '14 at 09:59

barak manos · Answer 2 · 2014-11-10T15:01:11.833

11

As already explained in previous answers, it is not normal practice but it's a valid one.

Here is an alternative solution:

File f1.h:

#ifndef F1_H
#define F1_H

#define F1_ARRAY                   \
{                                  \
     0, 1, 2, 3, 4, 5, 6, 7, 8, 9, \
    10,11,12,13,14,15,16,17,18,19, \
    20,21,22,23,24,25,26,27,28,29, \
    30,31,32,33,34,35,36,37,38,39, \
    40,41,42,43,44,45,46,47,48,49, \
    50,51,52,53,54,55,56,57,58,59, \
    60,61,62,63,64,65,66,67,68,69, \
    70,71,72,73,74,75,76,77,78,79, \
    80,81,82,83,84,85,86,87,88,89, \
    90,91,92,93,94,95,96,97,98,99  \
}

// Values above used as an example

#endif

File f1.c:

#include "f1.h"

float h[] = F1_ARRAY;

#define N (sizeof(h)/sizeof(*h))

...

edited Nov 10 '14 at 15:01

answered Nov 10 '14 at 14:49

barak manos

29,648
10
62
114

7

And this approach "burns" the symbols `F1_ARRAY` and `F1_H`. When you have a generated file it might be useful to avoid the use of preprocessor symbols like these. For human created header file your solution is better, but not for a bunch of generated data files. – harper Nov 11 '14 at 08:13
4

I fail to see why this is better than the existing solution. Now it is harder to autogenerate that datafile, and it pollutes symbol space. As harper already mentions, for human generated data, this may be better, but it's still worse than letting the humans generate it inside the real source file. If it's too complex, human generation s probably wrong anyways. – Sebastian Mach Nov 12 '14 at 14:10
@phresnel: Why is it harder to autogenerate that datafile? It merely includes 6 more lines of constant text (4 preprocessor directives and 2 curly brackets). – barak manos Nov 12 '14 at 14:19
@harper: Thank you for your comment. For a bunch of generated data files, since each file has a unique name, you can easily create a couple of unique preprocessor symbols per file. Generating those constant text lines by script shouldn't be too difficult either. – barak manos Nov 12 '14 at 14:23
@barakmanos: Exactly; now you need to print those extra lines somehow. You have to fit your existing Python or Haskell program to not only output comma-separated values, but C-code. If your coefficients are from a proprietary vendor, e.g. from a laboratory or from some light reflectance computation, you will now have to invoke additional `cat` et al., and have to make sure your `cat` and scripts are actually run. – Sebastian Mach Nov 12 '14 at 15:09

utnapistim · Answer 3 · 2014-11-10T12:28:18.343

10

So, is it normal practice to use float h[N] = { #include “f1.h” }; this way?

It is not normal, but it is valid (will be accepted by the compiler).

Advantages of using this: it spares you the small amount of effort that would be required to think of a better solution.

Disadvantages:

it increases WTF/SLOC ratio of your code.
it introduces unusual syntax, both in client code and in the included code.
in order to understand what f1.h does, you would have to look at how it is used (this means either you need to add extra docs to your project to explain this beast, or people will have to read the code to see what it means - neither solution is acceptable)

This is one of those cases where an extra 20 minutes spent thinking before writing the code, can spare you a few tens of hours of cursing code and developers during the lifetime of the project.

edited Nov 10 '14 at 12:28

answered Nov 10 '14 at 12:13

utnapistim

26,809
3
46
82

4

Missing newlines, the `#include` should be on a separate line. – Basile Starynkevitch Nov 10 '14 at 15:33
As Starynkevitch notes this C code will compile with the include directive surrounded by newlines. See ISO/IEC 9899 (C11) §6.10 2 "A preprocessing directive consists of a sequence of preprocessing tokens that begins with a # preprocessing token" ... "or that follows white space containing at least one new-line character, and is ended by the next new-line character." "...". What f1.h provides is apparent from context. It would be a comma separated list of expressions compatible with float, N sized. I'd anticipate those coefficients came from a separate standard or document anyway. – Nov 10 '14 at 19:32
1

What is "normal"? I consider it good practice when it gives better readability for imported data than having exotic and complex triggers which in turn create neat source files. The syntax is unusual, but for the experienced C or C++ programmer who knows what the preprocessor does, it should be a surprise, but not a big WTF. – Sebastian Mach Nov 12 '14 at 11:51
@BasileStarynkevitch, I know it should be on a separate line. I was more interested in answering the question asked by OP than in correcting the code. – utnapistim Nov 12 '14 at 13:12
@phresnel, in my professional experience (more than 8 years of C++) I've only seen this once in production code, and that was when Stephan Lavavej was explaining how they generated std::function specializations with variable number of parameters, by including the parameters into boilerplate declarations (I think it was before C++11). This is why I consider it atypical ("not normal"). Neat source files and good readability for imported data are not mutually exclussive (it is a false choice in this regard). – utnapistim Nov 12 '14 at 13:19
@utnapistim: Your arguments _"more than 8 years C++"_ and _"production code"_ are no-ops, unfortunately. There is not only the kind of production code _you_ have seen, and "8 years C++" is not that much at all. Also, you are on the borderline of _argumenting by authority_, instead of using your wetware. This seems to be driving your answer invalid. And indeed, looking at it again, I don't see any argumentation. You just rant, and provide not even a better alternative. – Sebastian Mach Nov 12 '14 at 13:54
1/2) `Neat source files and good readability for imported data are not mutually exclussive` --> Right, and this is nothing I doubted. However, if you have a coefficient-producing generator written in Python or Haskell, or a closed-source whatever, which outputs a comma-seperated lists of those, or if you bought some comma-separated data from some lab, why would you prefer copying it verbatim into your source-file, making future changes difficult, hosing the VCS-history of that file, making that file itself [...] – Sebastian Mach Nov 12 '14 at 13:55
2/2) [...] largely unreadable, and making it almost impossible to automate that task, when just having an `#include` that includes the data solves all those problems? And in no case I see why having an #include on comma separated data yields 10s of hours of cursing; the opposite would be the case. – Sebastian Mach Nov 12 '14 at 13:56
Btw, would you mind posting a link to the article? From what you say (boilerplate et al), I can only fear that it has nothing to do with "for filling array of coefficients", as was asked. – Sebastian Mach Nov 12 '14 at 14:00
@phresnel, I am saying that these kinds of includes are "not normal" in most code bases I have seen (and yes, YMMV). This is not argumenting from authority, because it is not argumenting - it is explaining why I do not consider such code "normal" (which actually was the OP's question). The OP was not asking for alternative solutions, but "is it normal practice to use [...]" – utnapistim Nov 12 '14 at 14:09
... yet you enrich your post with several speculations and claims, without any justification. No, that's not just saying "not normal", that's ranting and blind believe. – Sebastian Mach Nov 12 '14 at 14:12
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/64802/discussion-between-utnapistim-and-phresnel). – utnapistim Nov 12 '14 at 17:15
Could you explain the acronym "WTF/SLOC"? – 0x5C91 Nov 13 '14 at 08:23
@Numbers, the best way we know of insuring code quality is through peer review. The "WTF/SLOC" loosely translates to "How many times would your coleagues say "What the F***!" (WTF) when looking at n source lines of code (SLOC). The term is also a reference to [this](http://i1.wp.com/simpleprogrammer.com/wp-content/uploads/2010/06/wtfs_per_minute_thumb.jpg?resize=484%2C438) – utnapistim Nov 13 '14 at 08:30
I think it's a fun term, but I couln't really find it anywhere on Google besides your posts on SO... wouldn't it be better to write something more technical/serious? Nothing personal, and no interest to discuss in Meta, but IMHO "it reduces the clarity of your code" would cut it (or reduce the WTF/StackOverflowAnswer ratio if you prefer). – 0x5C91 Nov 14 '14 at 08:40

score 9 · Answer 4 · edited May 23 '17 at 12:16

No, it is not normal practice.

There is little to no advantage in using such a format directly, instead the data could be generated in a separate source file, or at least a complete definition could be formed in this case.

There is however a "pattern" which involves including a file in such random places: X-Macros, such as those.

The usage of X-macro is to define a collection once and use it in various places. The single definition ensuring the coherence of the whole. As a trivial example, consider:

// def.inc
MYPROJECT_DEF_MACRO(Error,   Red,    0xff0000)
MYPROJECT_DEF_MACRO(Warning, Orange, 0xffa500)
MYPROJECT_DEF_MACRO(Correct, Green,  0x7fff00)

which can now be used in multiple ways:

// MessageCategory.hpp
#ifndef MYPROJECT_MESSAGE_CATEGORY_HPP_INCLUDED
#define MYPROJECT_MESSAGE_CATEGORY_HPP_INCLUDED

namespace myproject {

    enum class MessageCategory {
#   define MYPROJECT_DEF_MACRO(Name_, dummy0_, dummy1_) Name_,
#   include "def.inc"
#   undef MYPROJECT_DEF_MACRO
    NumberOfMessageCategories
    }; // enum class MessageCategory

    enum class MessageColor {
#   define MYPROJECT_DEF_MACRO(dumm0_, Color_, dummy1_) Color_,
#   include "def.inc"
#   undef MYPROJECT_DEF_MACRO
    NumberOfMessageColors
    }; // enum class MessageColor

    MessageColor getAssociatedColorName(MessageCategory category);

    RGBColor getAssociatedColorCode(MessageCategory category);

} // namespace myproject

#endif // MYPROJECT_MESSAGE_CATEGORY_HPP_INCLUDED

fjardon · Answer 5 · 2014-11-12T13:14:16.730

A long time ago people overused the preprocessor. See for instance the XPM file format which was designed so that people could:

#include "myimage.xpm"

in their C code.

It's not considered good any more.

The OP's code looks like C so I will talk about C

Why is it overuse of the preprocessor ?

The preprocessor #include directive is intended to include source code. In this case and in the OP's case it is not real source code but data.

Why is it considered bad ?

Because it is very inflexible. You cannot change the image without recompiling the whole application. You cannot even include two images with the same name because it will produce non-compilable code. In the OP's case, he cannot change the data without recompiling the application.

Another issue is that it creates a tight-coupling between the data and the source code, for instance the data file must contain at least the number of values specified by the N macro defined in the source code file.

The tight-coupling also imposes a format to your data, for instance if you want to store a 10x10 matrix values, you can either choose to use a single dimension array or a two dimensional array in your source code. Switching from one format to the other will impose a change in your data file.

This problem of loading data is easily solved by using the standard I/O functions. If you really need to include some default images you can give a default path to the images in your source code. This will at least allow the user to change this value (through a #define or -D option at compile time), or to update the image file without need to recompile.

In the OP's case, its code would be more reusable if the FIR coeficients and x, y vectors where passed as arguments. You could create a structto hold togeteher these values. The code would not be inefficient, and it would become reusable even with other coeficients. The coeficients could be loaded at startup from a default file unless the user pass a command line parameter overriding the file path. This would remove the need for any global variables and makes the intentions of the programmer explicit. You could even use the same FIR function in two threads, provided each thread has got its own struct.

When is it acceptable ?

When you cannot do dynamic loading of data. In this case you have to load your data statically and you are forced to use such techniques.

We should note that not having access to files means you're programming for a very limited platform, and as such you have to do tradeoffs. This will be the case if your code run on a micro-controller for instance.

But even in that case I would prefer to create a real C source file instead of including floating point values from a semi-formatted file.

For instance, providing a real C function returning the coefficients, rather than having a semi-formatted data file. This C function could then be defined in two different files, one using I/O for development purposes, and enother one returning static data for the release build. You would compile the correct source file conditionnaly.

@YvesDaoust For one thing, it's a huge security hole. Since XPMs are image files, you often would get them from other sources rather than make them yourself. A malicious user could easily modify the XPM file to end the string, and thus easily inject code. — trlkly, Nov 12 '14 at 11:03
@trlkly: And how does the malicious user get access to the source code and assets version control system(s), as well as access to the production build machine? For one thing, you don't know C or C++ well. — Sebastian Mach, Nov 12 '14 at 14:03
"The preprocessor #include directive is intended to include source code": the preprocessor does not care, it builds a complete file from chunks, and the compiler doesn't know. (It can make a difference for precompiled headers, but these do not belong to the standard.) I use them when it is cleaner to do so. — , Nov 12 '14 at 14:35
@YvesDaoust You can certainly use the C preprocessor to do whatever macro substitution you want on any file type you want. But my response was in the context of this question. The C preprocessor is named **C** preprocessor for a reason: it is intended to preprocess **C** files. And C files are intended to be source code in the end. So I think it is true to say that: a directive intended to include files, intended to used by a preprocessor, itself intended to preprocess source code, is a directive intended to include source code. :) — fjardon, Nov 12 '14 at 14:58
@phresnel What are you talking about? They have direct access to the source code because the rogue XPM file has just been `#include`d a file. That's the scenario being discussed in the question. The XPM has access to the build environment because that's where the file is being included. Granted, if you're smart at all, you'll check the XPM to make sure it is valid before `#include`ing it, but it's still a major security flaw. — trlkly, Nov 13 '14 at 18:28
@trlkly: E.g., in my team, only team members can change XPMs. Users never see them. They only see the compiled binary. I am really not sure if you understood how #include works. #include is a compile time construct; #includes happen in the very first phase of compilation. Once XPM is included in the binary, tweaking it externally will not tweak any execution path within the binary. // I realize you have some Javascript-tags. Note that C++' and C's #include is completely different from require.js and the like, which are runtime constructs. In C & C++, the XPM is part of the source, not the dist — Sebastian Mach, Nov 13 '14 at 21:22

supercat · Answer 6 · 2014-11-11T16:46:40.217

There are at times situations which require either using external tools to generate .C files based upon other files that contain source code, having external tools generate C files with an inordinate amount of code hard-wired into the generating tools, or having code use the #include directive in various "unusual" ways. Of these approaches, I would suggest that the latter--though icky--may often be the least evil.

I would suggest avoiding the use of the .h suffix for files which do not abide by the normal conventions associated with header files (e.g. by including method definitions, allocating space, requiring an unusual inclusion context (e.g. in the middle of a method), requiring multiple inclusion with different macros defined, etc. I also generally avoid using .c or .cpp for files which are incorporated into other files via #include unless those files are primarily used standalone [I might in some cases e.g. have a file fooDebug.c containing #define SPECIAL_FOO_DEBUG_VERSION[newline]`#include "foo.c"`` if I wish to have two object files with different names generated from the same source, and one of them is affirmatively the "normal" version.]

My normal practice is to use .i as the suffix for either human-generated or machine-generated files that are designed to be included, but in usual ways, from other C or C++ source files; if files are machine-generated, I will generally have the generation tool include as the first line a comment identifying the tool used to create it.

BTW, one trick where I've used this was when I wanted to allow a program to be built using just a batch file, without any third-party tools, but wanted to count how many times it was built. In my batch file, I included echo +1 >> vercount.i; then in file vercount.c, if I recall correctly:

const int build_count = 0
#include "vercount.i"
;

The net effect is that I get a value which increments on every build without having to rely upon any third-party tools to produce it.

score 3 · Answer 7 · answered Nov 10 '14 at 12:10

3

When the preprocessor finds the #include directive it simply opens the file specified and inserts the content of it, as though the content of the file would have been written at the location of the directive.

answered Nov 10 '14 at 12:10

Laura Maftei

1,863
1
15
25

1

Do not forget that the included file will be preprocessed and then compiled as well. – Sebastian Mach Nov 12 '14 at 14:06

score 3 · Answer 8 · answered Nov 10 '14 at 12:12

As already said in the comments this is not normal practice. If I see such code, I try to refactor it.

For example f1.h could look like this

#ifndef _f1_h_
#define _f1_h_

#ifdef N
float h[N] = {
    // content ...
}

#endif // N

#endif // _f1_h_

And the .c file:

#define N 100 // filter order
#include “f1.h”

float x[N];
float y[N];
// ...

This seems a bit more normal to me - although the above code could be improved further still (eliminating globals for example).

score 3 · Answer 9 · answered Nov 10 '14 at 14:30

Adding to what everyone else said - the contents of f1.h must be like this:

20.0f, 40.2f,
100f, 12.40f
-122,
0

Because the text in f1.h is going to initialize array in question!

Yes, it may have comments, other function or macro usage, expressions etc.

score 3 · Answer 10 · 2014-11-12T08:45:21.100

It is normal practice for me.

The preprocessor allows you to split a source file into as many chunks as you like, which are assembled by #include directives.

It makes a lot of sense when you don't want to clutter the code with lengthy/not-to-be-read sections such as data initializations. As it turns out, my record "array initialization" file is 11000 lines long.

I also use them when some parts of the code are automatically generated by some external tool: it is very convenient to have the tool just generate his chunks, and include them in the rest of the code written by hand.

I have a few such inclusions for some functions that have several alternative implementations depending on the processor, some of them using inline assembly. The inclusions make the code more manageable.

By tradition, the #include directive has been used to include header files, i.e. sets of declarations that expose an API. But nothing mandates that.

score 2 · Answer 11 · answered Nov 11 '14 at 09:49

I read people want to refactor and saying that this is evil. Still I used in some cases. As some persons said this is a preprocesor directive so is including the content of file. Here's a case where I used: building random numbers. I build random numbers and I dont want to do this each time I compile neither in run time. So another program (usually a script) just fill a file with the generated numbers which are included. This avoids copying by hand, this allows easily changing the numbers, the algorithm which generates them and other niceties. You cannot blame easily the practice, in that case it is simply the right way.

Why don't you build the random numbers at compile time? This use case seems not too valid to me, as is. — Sebastian Mach, Nov 12 '14 at 14:07

score 2 · Answer 12 · answered Nov 11 '14 at 18:24

I used the OP's technique of putting an include file for the data initialization portion of a variable declaration for quite some time. Just like the OP, the included file was generated.

I isolated the generated .h files into a separate folder so they could be easily identified:

#include "gensrc/myfile.h"

This scheme fell apart when I started to use Eclipse. Eclipse syntax checking was not sophisticated enough to handle this. It would react by reporting syntax errors where there were none.

I reported samples to Eclipse mailing list, but there did not seem to be much interest in "fixing" the syntax checking.

I changed my code generator to take additional arguments so it could generated that entire variable declaration, not just the data. Now it generates syntactically correct include files.

Even if I was not using Eclipse, I think it is a better solution.

user1466329 · Answer 13 · 2018-09-27T16:44:34.050

In the Linux kernel, I found an example that is, IMO, beautiful. If you look at the cgroup.h header file

http://lxr.free-electrons.com/source/include/linux/cgroup.h

you can find the directive #include <linux/cgroup_subsys.h> used twice, after different definitions of the macro SUBSYS(_x); this macro is used inside cgroup_subsys.h, to declare several names of Linux cgroups (if you're not familiar with cgroups, they are user-friendly interfaces Linux offers, which must be initialized at system boot).

In the code snippet

#define SUBSYS(_x) _x ## _cgrp_id,
enum cgroup_subsys_id {
#include <linux/cgroup_subsys.h>
   CGROUP_SUBSYS_COUNT,
};
#undef SUBSYS

each SUBSYS(_x) declared in cgroup_subsys.h becomes an element of the type enum cgroup_subsys_id, while in the code snippet

#define SUBSYS(_x) extern struct cgroup_subsys _x ## _cgrp_subsys;
#include <linux/cgroup_subsys.h>
#undef SUBSYS

each SUBSYS(_x) becomes the declaration of a variable of type struct cgroup_subsys.

In this way, kernel programmers can add cgroups by modifying only cgroup_subsys.h, while the pre-processor will automatically add the related enumeration values/declarations in the initialization files.

+1 I agree. This is related to [*X macros*](https://en.wikipedia.org/wiki/X_Macro). I view it as a way to program the preprocessor to write part of my code, making the code maintainable. There are people who don't agree. — Mike Dunlavey, Oct 31 '16 at 00:22

Unusual usage of .h file in C

13 Answers13

Linked