How to make two otherwise identical pointer types incompatible

Question

On certain architectures it may be necessary to have different pointer types for otherwise identical objects. Particularly for a Harvard architecture CPU, you may need something like:

uint8_t const ram* data1;
uint8_t const rom* data2;

Particularly this is how the definition of pointers to ROM / RAM looked like in MPLAB C18 (now discontinued) for PICs. It could define even things like:

char const rom* ram* ram strdptr;

Which means a pointer in RAM to pointers in RAM pointing to strings in ROM (using ram is not necessary as by default things are in RAM by this compiler, just added all for clarity).

The good thing in this syntax is that the compiler is capable to alert you when you try to assign in an incompatible manner, like the address of a ROM location to a pointer to RAM (so something like data1 = data2;, or passing a ROM pointer to a function using a RAM pointer would generate an error).

Contrary to this, in avr-gcc for the AVR-8, there is no such type safety as it rather provides functions to access ROM data. There is no way to distinguish a pointer to RAM from a pointer to ROM.

There are situations where this kind of type safety would be very beneficial to catch programming errors.

Is there some way to add similar modifiers to pointers in some manner (such as by preprocessor, expanding to something which could mimic this behavior) to serve this purpose? Or even something which warns on improper access? (in case of avr-gcc, trying to fetch values without using the ROM access functions)

In C++ you could use std::unique_ptr which would make your pointer types fundamentally different. Nice. — Bathsheba, Apr 09 '18 at 12:40
@Bathsheba Isn't that a C++11 smart pointer? Anyone using heap allocation on an AVR needs to get fired from their job asap. — Lundin, Apr 09 '18 at 12:52
@Bathsheba: Doesn't that add overhead? And I don't see how this will emit different instructions for the access. After all, that's exactly what many true (actually many 32 bit CPUs like ARMv7 are internally Harvard, too) Harvard CPUs need. — too honest for this site, Apr 09 '18 at 14:53
@Olaf: It was a thoroughly terribly idea, but I didn't want to invalidate the comment thread. — Bathsheba, Apr 09 '18 at 14:53
@Bathsheba: One way might be to provide setters/getters and AFAIK some AVR libs do it exactly that way. Seems like there is nothing too bloated not to be used by some folks. — too honest for this site, Apr 09 '18 at 15:05
@curiousguy And how is that different from an ordinary pointer, since the whole purpose of smart pointers is to free dynamically allocated memory? — Lundin, Dec 03 '19 at 07:25

score 7 · Answer 1 · answered Apr 09 '18 at 12:59

One trick is to wrap the pointers in a struct. Pointers to struct have better type safety than pointers to the primitive data types.

typedef struct
{
  uint8_t ptr;
} a_t;

typedef struct
{
  uint8_t ptr;
} b_t;

const volatile a_t* a = (const volatile a_t*)0x1234;
const volatile b_t* b = (const volatile b_t*)0x5678;

a = b; // compiler error
b = a; // compiler error

score 4 · Answer 2 · answered Apr 09 '18 at 12:57

4

You could encapsulate the pointer in different struct for RAM and ROM, making the type incompatible, but containing the same type of values.

struct romPtr {
    void *addr;
};

struct ramPtr {
    void *addr;
};

int main(int argc, char **argv) {
    struct romPtr data1 = {NULL};
    struct romPtr data3 = data1;
    struct ramPtr data2 = data1; // <-- gcc would throw a compilation error here
}

During compilation :

$ cc struct_test.c
struct_test.c: In function ‘main’:
struct_test.c:12:24: error: invalid initializer
  struct ramPtr data2 = data1;
                    ^~~~~

You could of course typedefs the struct for brevity

answered Apr 09 '18 at 12:57

dvhh

4,724
27
33

1

I tested both solutions with gcc, from the perspective of performance they seem to be equally fine (gcc can generate identical code to using plain pointers for the AVR for both). I lean towards this solution though as with this you can still use the member pointer for functions accepting one (for example standard library stuff if you happen to work with C strings). With the other proposed solution it seems like there is no clean way to do this. – Jubatian Apr 10 '18 at 05:18
Just to note: This is still not the answer I feel being the most appropriate solution. In my tests, I am experimenting with using appropriately typed pointers encapsulated in structs, they work well for the intended goal (type safety) across interfaces while the pointer itself can also be used normally where needed (such as when interfacing with standard library stuff or things you can't refactor). – Jubatian Apr 10 '18 at 08:30
both version aren't perfect, they are workarounds to implement some kind of type safety around pointers that have the same base type. I feel that, short of customizing your compiler, there is little way to implement are *perfect* solution. – dvhh Apr 10 '18 at 09:15
That's a really bad idea. C provides data types for a reason. Using generic pointers and casting is outdated since to 1970ies. And with gcc it is not even necessary. – too honest for this site Apr 10 '18 at 11:10
@Olaf What if the pointer is not a generic one (to void), rather an appropriately typed one? I am currently leaning towards that if the problem had to be solved this way (with some sort of struct encapsulation), however it still has a huge problem that you can not add qualifiers (such as if you have a function which is meant to not modify the contents, seemingly there is no way to make it taking a pointer to const). – Jubatian Apr 10 '18 at 12:32
It would result in a bunch of `struct`s, less readable code and eventually you had to cast the pointer to the address space for every access. Please read my answer carefully, I try to explain the background in detail. – too honest for this site Apr 10 '18 at 13:20
1

@Olaf I was using `*void` pointer for a more generic answer that could adapted to suits @Jubatian's need ( by typing the encapsulated pointer type ). And about making the code less readable using `struct`s, I feel it is rather quite a subjective view. – dvhh Apr 11 '18 at 01:40
How does that help anyway? You not only have to cast for each usage of the pointer. Not only the actual type, but also the address space. There is nothing won, but all lost. – too honest for this site Apr 11 '18 at 10:22

Jubatian · Accepted Answer · 2018-05-02T09:51:25.883

Since I received several answers which offer different compromises on providing a solution, I decided to merge them in one, outlining the benefits and drawbacks of each. So you can choose the most appropriate for your particular situation

Named Address Spaces

For the particular problem of solving this, and only this case of ROM and RAM pointers on an AVR-8 micro, the most appropriate solution is this.

This was a proposal for C11 which didn't make it into the final standard, however there are C compilers which support it, including avr-gcc used for 8 bit AVRs.

The related documentation can be accessed here (part of the online GCC manual, also including other architectures using this extension). It is recommendable over other solutions (such as function-like macros in pgmspace.h for the AVR-8) as with this, the compiler can make the appropriate checks, while otherwise accessing the data pointed by remains clear and simple.

In particular, if you have a similar problem of porting something from a compiler which offered some sort of named address spaces, like MPLAB C18, this is likely the fastest and cleanest way to do it.

The ported pointers from above would look like as follows:

uint8_t const* data1;
uint8_t const __flash* data2;
char const __flash** strdptr;

(If possible, one could simplify the process using appropriate preprocessor definitions)

(Original answer by Olaf)

Struct encapsulation, pointer inside

This method aims to strenghten typing of pointers by wrapping them in structures. The intended usage is that you pass the structures themselves across interfaces, by which the compiler can perform type checks on them.

A "pointer" type to byte data could look like this:

typedef struct{
    uint8_t* ptr;
}bytebuffer_ptr;

The pointed data can be accessed as follows:

bytebuffer_ptr bbuf;
(...)
bbuf.ptr = allocate_bbuf();
(...)
bbuf.ptr[index] = value;

A function prototype accepting such a type and returning one could look like as follows:

bytebuffer_ptr encode_buffer(bytebuffer_ptr inbuf, size_t len);

(Original answer by dvhh)

Struct encapsulation, pointer outside

Similar to the method above, it aims to strenghten typing of pointers by wrapping them in structures, but in a different manner, providing a more robust constraint. The data type to be pointed to is which is encapsulated.

A "pointer" type to byte data could look like this:

typedef struct{
    uint8_t val;
}byte_data;

The pointed data can be accessed as follows:

byte_data* bbuf;
(...)
bbuf = allocate_bbuf();
(...)
bbuf[index].val = value;

A function prototype accepting such a type and returning one could look like as follows:

byte_data* encode_buffer(byte_data* inbuf, size_t len);

(Original answer by Lundin)

Which should I use?

Named Address Spaces in this regard don't need much discussion: They are the most appropriate solution if you only want to deal with a pecularity of your target handling address spaces. The compiler will provide you the compile-time checks you need, and you don't have to try to invent anything further.

If, however for other reasons you are interested in structure wrapping, these are matters which you may want to consider:

Both methods can be optimized just fine: at least GCC will generate identical code from either to using plain pointers. So you don't really have to consider performance: they should work.
Pointer inside is useful if you have either third-party interfaces to serve which demand pointers, or maybe if you are refactoring something so large which you can't do in one pass.
Pointer outside provides more robust type safety as you reinforce the pointed type itself with it: you have a true distinct type which you can't easily (accidentally) convert (implicit cast).
Pointer outside allows you to use modifiers on the pointer, such as adding const, which is important for creating robust interfaces (you can make data intended to be read only by a function const).
Keep in mind that some people might not like either of these, so if you are working in a group, or are creating code which might be reused by known parties, discuss the matter with them first.
Should be obvious, but keep in mind that encapsulating doesn't solve the problem of requiring special access code (such as by the pgmspace.h macros on an AVR-8), assuming no Named Address Spaces are used alongside with the method. It only provides a method to produce a compile error if you try to use a pointer by functions operating on a different address space than what it intends to point into.

Thank you for all the answers!

`struct` encapsulation does not change how the data is accessed. For true Harvard architectures it is hence unusable to access an address space other the one dedicated by the toolchain. As a result you have to specify the access to use to the compiler anyway. And that would be in term compiler-specific. There is simply no other way. That was the exact reason for the proposed named address space feature. The encapsulation solves a completely different problem and has other problems like padding. — too honest for this site, Apr 29 '18 at 14:41
@Olaf Fixed, I thought it being obvious, anyway, I added the note on this. The primary question was largely about how to get the compiler warn you in such situations even if no Named Address Spaces are available, which the struct encapsulation can solve (I didn't know about Named Address Spaces before, however now I see the problem is broader than Harvard micros, even OpenCL has such needs). I dunno why people apparently hate Named Address Spaces this much, I get a downvote for mentioning it here, you got at least two (I have an up on your answer). — Jubatian, May 02 '18 at 10:00

too honest for this site · Answer 4 · 2018-04-10T13:26:46.983

True harvard architectures use different instructions to access different types of memory like code (Flash on AVR), data (RAM), hardware peripheral registers (IO) and possibly others. The values of addresses in the ranges typically overlap, i.e. the same value accesses different internal devices, depending on the instruction.

Comming to C, if you want to use a unified pointer, this means you not only have to encode the address (value), but also the access type ("address space" in the following) in the pointer value. This can either be done using additional bits in a pointer's value, but also select the appropriate instruction at run-time for every access. This constitutes a significant overhead to the generated code. Additionally, often there are no spare bits in the "natural" value for at least some address spaces (e.g. all 16 bits of the pointer are used already for the address). So additional bits are required, at least a byte worth. This blows up memory usage (mostly RAM), too.

Both are typically unacceptable on typical MCUs using this architecture, because they are already quite limited. Fortunately, for most applications, it is absolutely unnecessary (or easily avoidable at least) to determine the address space at run-time.

To solve this problem all compilers for such a platform support some way to tell the compiler in which address space and object resides. Standard draft N1275 for the then-upcoming C11 proposed a standard way using "named address spaces". Unfortunately it did not make it into the final version, so we are left with compiler-extensions.

For gcc (see the documentation for other compilers), the developers implemented the original standard proposal. As the address spaces are target-specific, the code is not portable between different archittectures, but that is normally true for bare-metal embedded code anyway, nothing really lost.

Reading the documentation for AVR, an address space is simply used similar to a standard qualifier. The compiler will automatically emit the correct instructions to access the correct space. Also there is a unified address space which determines the area at run-time as explained above.

Address spaces work similar to qualifiers, there are stronger constraints to determine compatibility, i.e. when assigning pointers of different address spaces to each other. For a detailed description, see the proposal, chapter 5.

Conclusion:

named address spaces is what you want. They solve two problems:

Ensure pointers to incompatible address spaces can't be assigned to each other unnoticed.
Tell the compiler how to access the object, i.e. which instructions to use.

With regard to the other answers proposing structs, you have to specify the address space (and the type for void *) anyway once you acces the data. Usign the address space in the declaration keeps the rest of the code clean and even allows to change it lateron at a single location in the source code.

If you are after portability betwen tool-chains, rread their documentation and use macros. It is most likely you just will have to adopt the actual names of the address spaces.

Sidenote: The PIC18 example you cite actually uses the syntax for named address spaces. Just the names are deprecated, because an implementation should leave all non-standard names free for the application code. Hence the underscore-qualified names in gcc.

Disclaimer: I did not test the features, but relied on the documentation. Helpful feedback in comments appreciated.

Sidenote: I made this a bit longer providing the background to make it more useful for future readers. I'm, fine with edits correcting errors or shortening it **if it does not change the information**. — too honest for this site, Apr 10 '18 at 13:07
Accepted. Just notes for portability: The particular PIC18 code rigorously seperates hardware interface code and application code, having them around 5% to 95% ratio, the latter only carrying Harvard architecture specific details. Using the named address space feature porting that 95% was rather trivial (just a matter of #defining the "rom" keyword to "const __flash", and mostly only removing pragmas locating stuff in certain banks for the PIC used for optimization on that MCU). — Jubatian, Apr 10 '18 at 14:27
Without named address spaces, porting would have been a true nightmare, that's mostly how this question formed in my mind seeing that it would be outrageously difficult to make sure every ROM access was modified right, without the compiler providing any compile-time help on whether the port is sound. — Jubatian, Apr 10 '18 at 14:31
Downvoters: Care to explain what is your problem with answers mentioning Named Address Spaces? — Jubatian, May 10 '18 at 09:22

How to make two otherwise identical pointer types incompatible

4 Answers4

Conclusion:

Linked