6

A new preprocessor directive is available in the upcoming C23 Standard: #embed

Here is a simple example:

// Placing a small image resource.

#include <stddef.h>

void show_icon(const unsigned char *, size_t);

int main (int, char*[]) {
    static const unsigned char icon_data[] = {
#embed "black_sheep.ico"
    };
    show_icon(icon_data, sizeof(icon_data));
    return 0;
}

Here is a more elaborate one, initializing non arrays from binary data (whatever that means):

int main() {
    /* Braces may be kept or elided as per normal initialization rules */
    int i = {
#embed "i.dat"
    }; /* i value is [0, 2^(embed element width)) first entry */
    int i2 =
#embed "i.dat"
    ; /* valid if i.dat produces 1 value, i2 value is [0, 2^(embed element width)) */
    struct s {
        double a, b, c;
        struct { double e, f, g; };
        double h, i, j;
    };
    struct s x = {
        /* initializes each element in order according to 
           initialization rules with comma-separated list
           of integer constant expressions inside of braces
         */
#embed "s.dat"
   };
   return 0;
}

What is the purpose of adding this to the C language?

chqrlie
  • 131,814
  • 10
  • 121
  • 189

1 Answers1

2

#embed allows easy inclusion of binary data in a program executable image, as arrays of unsigned char or other types, without the need for an external script run from a Makefile. Most compilers are very inefficient at parsing such arrays, with a notable exception: tcc.

Embedding binary or even textual data offers benefits over reading from files:

  • there might not be a file system
  • the path to the files might be non obvious
  • the files could be missing or inaccessible

The main reason for adding this to the C language seems to be the new urge to dump upon C every trendy C++ feature in a vain attempt to converge C toward a common subset of both languages. The C++ committee was strongly in favor on this extension whereas the C committee was less thrilled.

Read the details in: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20embed.html

It look 30 years for strdup() to make it into the Standard library and all of a sudden C23 gladly extends the language by 50% in all directions with no remorse.

The rationale for making this a preprocessor kludge is highly questionable and the last reason speaks for itself:

Finally, Microsoft has an ABI problem with its maximum string literal size that cannot be solved using string literals or anything treated like string literals

The specification for #embed is if full of quirks and shortcomings. The reluctance at writing proper scripts leads to abominations such as:

const unsigned char null_terminated_file_data[] = {
    #embed "might_be_empty.txt" \
        prefix(0xEF, 0xBB, 0xBF, ) /* UTF-8 BOM */ \
        suffix(,)
    0 // always null-terminated
};

Or worse:

int main () {
#define SOME_CONSTANT 0
    return
#embed </dev/urandom> if_empty(0) limit(SOME_CONSTANT)
    ;
}

A simple data description and manipulation language to assemble binary files into linkable objects and resources would have been less intrusive and easy to include in existing build systems for all languages and more importantly all existing compilers.

The paper enumerates interesting examples where #embed may come in handy, but a more general solution seems possible.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 2
    TIL the strdup is now c standard as opposed to just posix. One of my faves – pm100 Nov 30 '22 at 00:48
  • chqrlie, Interesting insights. – chux - Reinstate Monica Nov 30 '22 at 00:59
  • 1
    I came across `#embed` via an article on Medium — [Three hottest C23 features approved on July 2022](https://tomaszs2.medium.com/three-hottest-c23-features-approved-on-july-2022-91922a9f2359), which links to [finally. `#embed`](https://thephd.dev/finally-embed-in-c23#), which in turn links to a committee paper on the topic [N3017](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#appendix). There's also the September 2022 working draft of C23 as [N3054](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf). – Jonathan Leffler Nov 30 '22 at 03:34
  • Oooh — I just looked at N3054 §7.26.6.2 **The `memset_explicit` function**: `void *memset_explicit(void *s, int c, size_t n);` —— _The memset_explicit function copies the value of `c` (converted to an `unsigned char` into each of the first `n` characters of the object pointed to by `s`. The purpose of this function is to make sensitive information stored in the object inaccessible(380)._ and footnote 380 says: _The intention is that the memory store is always performed (i.e., never elided), regardless of optimizations. This is in contrast to calls to the `memset` function (7.26.6.1)._ Hooray! – Jonathan Leffler Nov 30 '22 at 03:41
  • 4
    It was already questionable style to `#include` a source file in the middle of an initializer list. Anyway, this would be a better answer without the rant. I think most people agree that the C committee should focus on solving all the well-known problems of the language instead of introducing new, useless features. (I applaud the removal of everything that's not 2's complement for example, this will fix numerous problems with the language and is probably the best thing that's happened to C since the removal of implicit int.) – Lundin Nov 30 '22 at 08:03
  • 2
    Yeah, it's so useless that people are literally sending snail-mail letters to ISO thanking for the feature: https://thephd.dev/finally-embed-in-c23#actual-real-touchable-non-electronic-mail – MiKom Jun 16 '23 at 12:24
  • 4
    While I appreciate the extent of your C experience, I've downvoted this answer because most of it is your personal opinion and in no way reflects a consensus among C programmers. I know that many don't like the amount of growth, but I've talked to many others who are happy with the changes like I am. I'm very glad that the committee is standardizing things which had to be done in inferior or implementation-specific ways. In the case of `#embed`, this involves either writing toolchain-specific linker commands, or converting the binaries to headers and potentially tanking compilation times. – Pkkm Jul 08 '23 at 23:27
  • "The C++ committee was strongly in favor on this extension whereas the C committee was less thrilled." I'm going to have to call 'citation needed' on this. If the C++ committee were really so thrilled at the idea and the C committee really weren't, how the hell did C23 end up with `#embed` while in C++23 this feature is nowhere to be seen? Are you sure you haven't got some wires crossed somewhere along the line? – Pharap Sep 01 '23 at 22:36
  • @Pharap: The answer indeed needs rephrasing: the C++ Committee was strongly in favor of vendor extensions in the parameter specification whereas the C Committee was less thrilled by the parameter specification itself. – chqrlie Sep 02 '23 at 09:56