4

I have been writing C for a decent amount of time, and obviously am aware that C does not have any support for explicit private and public fields within structs. However, I (believe) I have found a relatively clean method of implementing this without the use of any macros or voodoo, and I am looking to gain more insight into possible issues I may have overlooked.

The folder structure isn't all that important here but I'll list it anyway because it gives clarity as to the import names (and is also what CLion generates for me).

- example-project
  - cmake-build-debug
  - example-lib-name
    - include
      - example-lib-name
        - example-header-file.h
    - src
      - example-lib-name
        - example-source-file.c
    - CMakeLists.txt
  - CMakeLists.txt
  - main.c

Let's say that example-header-file.h contains:

typedef struct ExampleStruct {
    int data;
} ExampleStruct;

ExampleStruct* new_example_struct(int, double);

which just contains a definition for a struct and a function that returns a pointer to an ExampleStruct.

Obviously, now if I import ExampleStruct into another file, such as main.c, I will be able to create and return a pointer to an ExampleStruct by calling ExampleStruct* new_struct = new_example_struct(<int>, <double>);, and will be able to access the data property like: new_struct->data.

However, what if I also want private properties in this struct. For example, if I am creating a data structure, I don't want it to be easy to modify the internals of it. I.e. if I've implemented a vector struct with a length property that describes the current number of elements in the vector, I wouldn't want for people to just be able to change that value easily.

So, back to our example struct, let's assume we also want a double field in the struct, that describes some part of internal state that we want to make 'private'.

In our implementation file (example-source-file.c), let's say we have the following code:

#include <stdlib.h>
#include <stdbool.h>

typedef struct ExampleStruct {
    int data;
    double val;
} ExampleStruct;

ExampleStruct* new_example_struct(int data, double val) {
    ExampleStruct* new_example_struct = malloc(sizeof(ExampleStruct));
    example_struct->data=data;
    example_struct->val=val;
    return new_example_struct;
}

double get_val(ExampleStruct* e) {
    return e->val;
}

This file simply implements that constructor method for getting a new pointer to an ExampleStruct that was defined in the header file. However, this file also defines its own version of ExampleStruct, that has a new member field not present in the header file's definition: double val, as well as a getter which gets that value. Now, if I import the same header file into main.c, which contains:

#include <stdio.h>
#include "example-lib-name/example-header-file.h"

int main() {
    printf("Hello, World!\n");
    ExampleStruct* test = new_example(6, 7.2);
    printf("%d\n", test->data); // <-- THIS WORKS
    double x = get_val(test); // <-- THIS AND THE LINE BELOW ALSO WORK
    printf("%f\n", x); //
    // printf("%f\n", test->val); <-- WOULD THROW ERROR `val not present on struct!`
    return 0;
}

I tested this a couple times with some different fields and have come to the conclusion that modifying this 'private' field, val, or even accessing it without the getter, would be very difficult without using pointer arithmetic dark magic, and that is the whole point.

Some things I see that may be cause for concern:

  • This may make code less readable in the eyes of some, but my IDE has arrow buttons that take me to and from the definition and the implementation, and even without that, a one line comment would provide more than enough documentation to point someone in the direction of where the file is.

Questions I'd like answers on:

  1. Are there significant performance penalties I may suffer as a result of writing code this way?
  2. Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.

Aside: I am not trying to make C into C++, and generally favor the way C does things, but sometimes I really want some encapsulation of data.

Anish Sinha
  • 175
  • 3
  • 13
  • How do you guarantee that the compiler representation in memory of both the structs will match? – littleadv Jan 10 '22 at 22:22
  • 1
    I think this will work as long as you only use pointers to `ExampleStruct`. If you try to make an array of `ExampleStruct`, or use `sizeof(ExampleStruct)`, the result will be different in the client and implementation code. – Barmar Jan 10 '22 at 22:22
  • Also, what happens when you pass a wrong instance of ExampleStruct to get_val? – littleadv Jan 10 '22 at 22:24
  • @littleadv how would that happen? Unless I explicitly created one without calling its constructor which is declared in the header. – Anish Sinha Jan 10 '22 at 22:28
  • And what prevents you from doing that? – littleadv Jan 10 '22 at 22:29
  • @Barmar That's a good point actually. Do you know of any ways I could mitigate this by struct padding or something? – Anish Sinha Jan 10 '22 at 22:29
  • @littleadv Nothing, but that's the same for most things in C. You have a valid point though, and one that requires some thought to mitigate – Anish Sinha Jan 10 '22 at 22:32
  • 2
    The more common solution is to make the entire structure opaque, rather than having public and private fields. E.g. the `FILE` structure of stdio. – Barmar Jan 10 '22 at 22:32
  • @Barmar Yeah I know, that's what I'd been doing for a while but it got annoyingly clumsy writing getters for *everything*, which is why I even thought of this in the first place. – Anish Sinha Jan 10 '22 at 22:33
  • I think you are wasting time solving problems that don't exist. APIs exist to provide a convenient way to interact with data. Why bother if someone wants to manipulate data in the same memory space of the process they spawn? If you actually need to protect internals, simply don't offer a library API and instead communicate through actual boundaries, such as sockets. – Cheatah Jan 10 '22 at 22:34
  • I guess the answer to that is 'why use private or protected variables at all' – Anish Sinha Jan 10 '22 at 22:36
  • @Barmar The `FILE` structure of stdio is all public. It can be viewed in `stdio.h` and every field can be directly accessed. Users are discouraged from messing with the contents directly, but nothing prevents users from shooting themselves in the foot. – user3386109 Jan 10 '22 at 22:38
  • @user3386109 C does not specific that `FILE` must be public. I worked with systems that only expose `FILE *`. – chux - Reinstate Monica Jan 10 '22 at 23:06

4 Answers4

5
  1. Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.

Yes: your approach produces undefined behavior.

C requires that

All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.

(C17 6.2.7/2)

and that

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,

[...]

  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

(C17 6.5/7, a.k.a. the "Strict Aliasing Rule")

Your two definitions of struct ExampleStruct define incompatible types because they specify different numbers of members (see C17 6.2.7/1 for more details on structure type compatibility). You will definitely have problems if you pass instances by value between functions relying on different of these incompatible definitions. You will have trouble if you construct arrays of them, whether dynamically, automatically, or statically, and attempt to use those across boundaries between TUs using one definition and those using another. You may have problems even if you do none of the above, because the compiler may behave unexpectedly, especially when optimizing. DO NOT DO THIS.


Other alternatives:

  1. Opaque pointers. This means you do not provide any definition of struct ExampleStruct in those TUs where you want to hide any of its members. That does not prevent declaring and using pointers to such a structure, but it does prevent accessing any members, declaring new instances, or passing or receiving instances by value. Where member access is needed from TUs that do not have the structure definition, it would need to be mediated by accessor functions.

  2. Just don't access the "private" members. Do not document them in the public documentation, and if you like, explicity mark them (in code comments, for example) as reserved. This approach will be familiar to many C programmers, as it is used a lot for structures declared in POSIX system headers.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 1
    A way to prevent inadvertent access to "private" members is to append the `__LINE__` to the member name in public mode but not private mode - creates headaches for debuggers. A lot of rigmarole though versus simply commenting users to "don't do that". Nice answer. – chux - Reinstate Monica Jan 11 '22 at 01:00
1

Are there significant performance penalties I may suffer as a result of writing code this way?

Probably:

  • Heap allocation is expensive, and - today - usually not optimized away even when that is theoretically possible.
  • Dereferencing a pointer for member access is expensive; although this might get optimized away with link-time-optimization... if you're lucky.

i.e. is there a simpler way to do this

Well, you could use a slack array of the same size as your private fields, and then you wouldn't need to go through pointers all the time:

#define EXAMPLE_STRUCT_PRIVATE_DATA_SIZE sizeof(double)

typedef struct ExampleStruct {
    int data;
    _Alignas(max_align_t) private_data[EXAMPLE_STRUCT_PRIVATE_DATA_SIZE];
} ExampleStruct;

This is basically a type-erasure of the private data without hiding the fact that it exists. Now, it's true that someone can overwrite the contents of this array, but it's kind of useless to do it intentionally when you "don't know" what the data means. Also, the private data in the "real" definition will need to have the same, maximal, _AlignAs() as well (if you want the private data not to need to use AlignAs(), you will need to use the real alignment quantum for the type-erased version).

The above is C11. You can sort of do about the same thing by typedef'ing max_align_t yourself, then using an array of max_align_t elements for private data, with an appropriate length to cover the actual size of the private data.


An example of the use of such an approach can be found in CUDA's driver API:

The first structure has a pair of reserved void* fields, hiding the fact that it's really the second structure. They could have used an unsigned char array, but it so happens that the private fields are pointer-sized, and void* is also kind of opaque.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
1

As long as the public has a complete definition for ExampleStruct, it can make code like:

 ExampleStruct a = *new_example_struct(42, 1.234);

Then the below will certainly fail.

 printf("%g\n", get_val(&a));

I recommend instead to create an opaque pointer and provide access public functions to the info in .data and .val.

Think of how we use FILE. FILE *f = fopen(...) and then fread(..., f), fseek(f, ...), ftell(f) and eventually fclose(f). I suggest this model instead. (Even if in some implementations FILE* is not opaque.)

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Do you mean create an opaque data type where everything is essentially private and provide getters/setters? Because I agree, that would work, and is what I've been doing so far. In tree structures this does get tedious though, because you can't just say 'e->left' you have to say something like get_left(e); – Anish Sinha Jan 10 '22 at 22:41
  • @AnishSinha True, but with `get_left(e)` I can only read the `.left`, not set and read it like with `'e->left`. – chux - Reinstate Monica Jan 10 '22 at 22:44
  • Good point, I didn't think of that. I suppose opaque types and getters/setters for everything is the way to go, thank you – Anish Sinha Jan 10 '22 at 22:52
  • 1
    @AnishSinha Points include 1), not all _setters_ need to be publics, 2) Some _setters_ could be paired like `set_size_string(this, size, ptr)` to require the set as a pair of data `size/ptr`. 3) If internal data was volatile, a _getter_ like `get_size_string(this, size_t*size, char **string)` can solve getting a coherent pair. – chux - Reinstate Monica Jan 10 '22 at 22:58
1

This causes undefined behaviour, as detailed in the other answers. The usual way around this is to make a nested struct.

In example.h, one defines the public-facing elements. struct example is not meant to be instantiated; in a sense, it is abstract. Only pointers that are obtained from one of it's (in this case, the) constructor are valid.

struct example { int data; };

struct example *new_example(int, double);
double example_val(struct example *e);

and in example.c, instead of re-defining struct example, one has a nested struct private_example. (Such that they are related by composite aggregation.)

#include <stdlib.h>
#include "example.h"

struct private_example {
    struct example public;
    double val;
};

struct example *new_example(int data, double val) {
    struct private_example *const example = malloc(sizeof *example);
    if(!example) return 0;
    example->public.data = data;
    example->val = val;
    return &example->public;
}

/** This is a poor version of `container_of`. */
static struct private_example *example_upcast(struct example *example) {
    return (struct private_example *)(void *)
        ((char *)example - offsetof(struct private_example, public));
}

double example_val(struct example *e) {
    return example_upcast(e)->val;
}

Then one can use the object as in main.c. This is used frequently in linux kernel code for container abstraction. Note that offsetof(struct private_example, public) is zero, ergo example_upcast does nothing and a cast is sufficient: ((struct private_example *)e)->val. If one builds structures in a way that always allows casting, one is limited by single inheritance.

Neil
  • 1,767
  • 2
  • 16
  • 22
  • This is pretty much what I was looking for. +1. Two questions though: what is `container_of` and what does `offsetof` do? – Anish Sinha Jan 11 '22 at 00:14
  • You can use `container_of` if you know that the pointer is an offset into a larger `struct`. Defined in the Kernel, it uses GCC extensions to improve type-safety. Note that `offsetof` returns 0 in this case; `public` is the first member of `private_example`, so unneeded here. See [analyze container_of and offsetof](https://bitboom.github.io/analyze-containerof) and [understanding container_of](https://stackoverflow.com/q/15832301/2472827). – Neil Jan 11 '22 at 00:46
  • _Viz_ you could just cast it to `((struct private_example *)e)->val` and it would work fine, but only because `public` is the first member of `private_example`. – Neil Jan 11 '22 at 00:54
  • 1
    Note that although this avoids UB arising from incompatible definitions of `struct example`, it creates a risk of code assuming that the `struct example` to which a given pointer points is a member of a `struct private_example`, when in fact it is not. That doesn't mean you can't do this -- and I have done similar myself -- but I don't really recommend it. – John Bollinger Jan 11 '22 at 14:12
  • Note also that as long as `public` is the first member of `struct private_example`, the given `example_upcast` function is way overkill. A simple typecast would do, and would be more idiomatic. – John Bollinger Jan 11 '22 at 14:15
  • @JohnBollinger those are good points; also, one loses the ability to apply `sizeof` accurately. In a sense, this makes `example` abstract with a single `private_example` instatiation. – Neil Jan 11 '22 at 19:57