How to embed the data into array in compile time in C++?

Question

I want to load a database into an array at compile time like:

//a.dat
1 2 3 4 5

int main(){ 
    unsigned int a[5]=f("a.dat");
}

But I can't find a simple solution in StackOverflow. Is there anyone can give a way to make it. I guess there are two ways:

Use a program to make the database in hard-code style, code:

a[]={1,2,3,4,5};

use a function to read the data (but that is at runtime), like:

a[5]=f("a.dat");

It's suggested as a future feature: [`std::embed`](http://open-std.org/JTC1/SC22/WG21/docs/papers/2020/p1040r6.html). Until that time you can look for some sorcery generators. — JHBonarius, Apr 30 '22 at 07:43
Oh, awful. This mean I can only look for the first way to solve it. — Learning Lin, Apr 30 '22 at 07:47
see also https://stackoverflow.com/questions/4158900/embedding-resources-in-executable-using-gcc — Klaus, Apr 30 '22 at 07:48
File access happens at runtime. Using a relative path there's no way to change this even if the compiler vendor was willing to add non-standard language features for this: the real path depends on the working directory which is not known at compile time. If you want a to hardcode the data at compile time, you need to embed the data into the binary somehow and a simple solution would be to generate a header containing `constexpr const unsigned int[] = { ... };` — fabian, Apr 30 '22 at 07:55
@fabian why would you need both `constexpr` and `const`, I thought `constexpr` implies `const`? — codeling, Apr 30 '22 at 07:59
Lol, autocorrect. I meant "source generators". I.e. some code (likely in another language) that converts your data to a C or C++ header/ source file — JHBonarius, Apr 30 '22 at 08:13
Also see that preprocessor sister proposal http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1967r5.html for `#embed` — Sebastian, Apr 30 '22 at 09:06

codeling · Answer 1 · 2022-04-30T08:08:11.023

A third, but also not ideal option would be to #include your data file.

This solution requires you to adapt your data to include a comma between the values though:

a1.dat:

1, 2, 3, 4, 5

cpp1:

#include <iostream>

int main()
{
    constexpr int N = 5;
    unsigned int a[N] = {
        #include "a1.dat"
    };

    for (size_t i=0; i<N; ++i)
    {
        std::cout << a[i] << " ";
    }
    std::cout << "\n";
}

But of course, the separation between N (the number of values) and the actual values for the numbers is not good design - N needs to be updated in case the number of values in a.dat changes, which could easily be forgotten.

You might therefore want to choose to move the whole array declaration to the header:

a2.dat:

    unsigned int a[] = {
        1, 2, 3, 4, 5
    };

cpp2:

#include <iostream>

int main()
{
    constexpr int N = 5;
    #include "a2.dat"
    for (size_t i=0; i<N; ++i)
    {
        std::cout << a[i] << " ";
    }
    std::cout << "\n";
}

Though this doesn't change the problem with having to specify the number of elements somewhere (and note that leaving out the N in the declaration of a means that now there is no way for the compiler to enforce that there are actually N elements in a)!

To circumvent this, you might want to use an std::vector instead:

a3.dat:

    std::vector<unsigned int> a = {
        1, 2, 3, 4, 5
    };

cpp3:

#include <iostream>
#include <vector>

int main()
{
    #include "a3.dat"
    for (auto anum: a)
    {
        std::cout << anum << " ";
    }
    std::cout << "\n";
}

The std::embed proposal mentioned above also has some comments on this "solution":

Of course, #include expects the format of the data to be source code, and thusly the program fails with spectacular lexer errors

... which works only if data is already in C-style. OP has data without ',' I believe. — Klaus, Apr 30 '22 at 07:49
true, though that doesn't seem to be a real issue here, as option 1 mentioned in the question requires to change the representation of the data even more. — codeling, Apr 30 '22 at 07:54

score 0 · Answer 2 · answered Apr 30 '22 at 08:39

Use xxd that has a "C-style" output, to convert your input for suitable "embedding":

$ xxd -i a.bin
unsigned char a_bin[] = {
  0xd0, 0xe5, 0x46, 0x82, 0x0d, 0xda, 0x72, 0xe8, 0x0f, 0x3f, 0x00, 0x66,
  0xf0, 0xdd, 0x67, 0xd5
};
unsigned int a_bin_len = 16;

The variable names are based on input filename, so name your source files wisely.

This is getting convenient if you have a "build pipeline", which can be as minimal as a Makefile:

a.c: a.bin
    xxd -i $< $@

main: main.o a.o
    $(CC) -o $@ $^

Don't forget to add the generated a.c into your .gitignore or something alike.

How to embed the data into array in compile time in C++?

2 Answers2