3

I have a printf-like function that can handle %s (char *) and %ls (wchar_t *) conversions. Everything works fine if I pass the right argument for the right conv specifier.

But if I pass a char * to my function when it expects a wchar_t *, it may segfault (Null-terminating byte being located at the second byte of a wchar_t for instance). Since I access this argument through va_arg() I can't be sure of the type.

If I consider that this char array is always NUL-terminated, I could check byte after byte to correctly handle the NUL-terminating char and stop memory access after it. But then I wouldn't be able to handle wchar_t legit values like this :

0b XXXXXXXX XXXXXXXX 00000000 XXXXXXXX

I'm already using the __attribute__ printf GNU C extension. But I this function may be used by a python programm through ctypes, so format/type checking at compiling may not be enough.

It there a way to perform such checking at runtime in my C function?

(NB : "There is no such way" may be the answer, but I'm still asking in order to be completely sure.)

vmonteco
  • 14,136
  • 15
  • 55
  • 86
  • 1
    If it is going to be used through python, have you considered writing a wrapper for it in python? – Ben Steffan Jun 18 '17 at 18:34
  • at runtime you can't but at compile time it's could be possible to write a compiler extension that verify your parameters. But this is not an issue in C, you could just let the user crash. – Stargateur Jun 18 '17 at 18:34
  • 1
    Also I don't believe this is possible in C (you would actually need to pass runtime type information, which C does not support). – Ben Steffan Jun 18 '17 at 18:35
  • "Checking pointer type at runtime" - What use does this have? C is statically typed! – too honest for this site Jun 18 '17 at 18:37
  • @Olaf Maybe "checking data type/structure at a memory address" would be more accurate then. – vmonteco Jun 18 '17 at 18:40
  • @vmonteco: That does not change anything! It still is statically typed. Using a differen type invokes undefined behaviour. If you try something "hackish", you are on your own. Don't try to outsmart your compiler; modern compilers are quite unforgiving. – too honest for this site Jun 18 '17 at 18:42
  • @BenSteffan I'm actually considering solving this part of my problem python-side and that's what I plan to do if I can't do it C-side, indeed! – vmonteco Jun 18 '17 at 18:42
  • Type-checking in Python is frowned upon. Y>ou might use exceptions, but otherwise use duck-typing. And in general: **write correct code**! – too honest for this site Jun 18 '17 at 18:43
  • That's an XY problem. Once you have the wrong type, you wrote wrong code. That's what testing and debugging are for. Also enabling all recommended compiler warnings and fixing them is **strongly** advised. – too honest for this site Jun 18 '17 at 18:45
  • if you cannot solve your problem any other way, you could try validating the encoding: at least in case of UTF32 (ie 4-byte `wchar_t`), you'll be able to catch quite a bit of incorrect usage as the highest unicode codepoint is `0x10FFFF` – Christoph Jun 18 '17 at 18:46
  • @Christoph I've considered this, but wchar_t values could contain nul octets too (if that's what you were thinking about). – vmonteco Jun 18 '17 at 18:47
  • @Olaf Then on C-side I should just fall back to considering my function will be used the right way and I shouldn't care about handling such case? – vmonteco Jun 18 '17 at 18:49
  • @vmonteco: the check works the other way around: if you have a `char` based string that's longer than 3 characters, you're likely to get an invalid codepoint as the highest 11 bits won't be zero – Christoph Jun 18 '17 at 18:50
  • @Olaf Anyway, if I consider that the pointer was pointing to a wrong-type data, I could as well consider that the array isn't NUL-terminated, that the address contained in the pointer is wrong and that trying to access it would trigger a segfault too in a way. – vmonteco Jun 18 '17 at 18:52
  • @vmonteco: That's the idea of the C language! It is weakly typed, and you must not invoke undefined behaviour. A good **modern** (i.e. C99 - better C11) C book will help, read it! – too honest for this site Jun 18 '17 at 19:00

1 Answers1

2

No, this is not possible.

In a typical C implementation, the type system only exists as an aid at compile time.* At runtime, all you have is bytes of data, and there's no way to even tell a pointer from a number (besides educated guessing).

Technically, you're not even allowed to va_arg(ap, const char*) and then examine memory if the original argument was not a char*, signed char*, unsigned char* or void*, or a const-related version of such a type. The type passed to va_arg is always required to be a compatible type. (One reason is there's no guarantee pointers to different types have the same size, layout, and meaning.)

(*In C++ there's a bit more to the story, because data representing types is stored connected to polymorphic objects to make dynamic_cast and typeid work correctly, and associated with all exception objects to make catch blocks work correctly. But none of this is compatible with va_arg.)

aschepler
  • 70,891
  • 9
  • 107
  • 161