4

In order to print a size_t integer in C with printf, the conversion formatter is %zu.

However when I use printf with %zu, calling the C function in Haskell through the FFI prints zu instead of the integer. How to solve that?

Minimal example

file zu.c

#include <stdio.h>

void printzu(){
    size_t x = 666;
    printf("x=%zu", x);
}

module Lib.hs

{-# LANGUAGE ForeignFunctionInterface #-}
module Lib
  where
import Foreign

foreign import ccall unsafe "printzu" printzu' :: IO ()

Test

Prelude> import Lib
Prelude Lib> printzu'
x=zu
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
  • 4
    This smells of an outdated C standard library, that's already linked in the haskell process. Does this by chance happen on windows? Could be the good old MSVCRT.DLL ;) –  Apr 20 '18 at 13:41
  • 1
    You could work around it by using `%lu` and explicitly cast `x` to `unsigned long` -- with the slight risk of wrong output on platforms where `size_t` is larger than `unsigned long` and your program actually handles a size that large ... –  Apr 20 '18 at 13:43
  • @FelixPalmen yes, Windows. Not tested on Linux. – Stéphane Laurent Apr 20 '18 at 13:49
  • Ok, I will try your workaround. Currently I simply use `%u` instead of `%zu`, without casting, this works but I get some tedious warnings in my C editor. – Stéphane Laurent Apr 20 '18 at 13:52
  • You could check with some process explorer tool what modules are dynamically linked in both cases. I bet when running from haskell, you have the MSVCRT.DLL from the windows system folder. This one doesn't support `%zu`, at least not through the publicly visible interface. –  Apr 20 '18 at 13:52
  • Yes, use `%lu` to further reduce the risk of precision loss, this was already supported in the oldest standard. The cast is **strictly** necessary, as `printf()` is a variadic function, so there's no prototype for the compiler to do the conversion automatically. –  Apr 20 '18 at 13:53

3 Answers3

5

As printf() is part of the C standard library, it is typically implemented in some runtime library. When this is linked dynamically, it's possible to have such effects with the same code, if, depending on which Process calls the code, a different version of the library is linked. If %zu doesn't work, it's an old version that doesn't support C99 yet.

On windows, it's quite probably the system's MSVCRT.DLL, that's not intended for public usage any more, but is kept compatible to the old MS Visual C 6 version. For example MinGW by default links to that library, so you don't need to ship your own C runtime. This of course has the drawback to limit the library functions to C89/C90.

An often reasonably safe thing to do for printing a size_t is to convert it to an unsigned long and print that:

size_t x = 666;
printf("x=%lu", (unsigned long)x);

This would only give wrong results if

  • the platform actually has a larger size_t than unsigned long (this is true e.g. for a 64bit system with LLP64 data model like, unfortunately, win64) and
  • you really have a size at runtime that doesn't fit in the unsigned long. This would have to be at least a value larger than 4G (232) as this is the guaranteed minimum range for unsigned long.

Please note that the cast is very important here. Because printf() is a variadic function, the prototype just looks like printf(const char *fmt, ...), so there's no type information for the compiler available -- therefore automatic conversions are not possible.


If the problem is specifically MSVCRT.DLL and you want to stick to C99 or later in general, I suggested a method using inttypes.h in an earlier answer. This will never print a wrong value on Windows (and still require a C99-conforming standard library on other platforms).

  • 1
    Note that Windows (64-bit) is precisely the one real-world case where `unsigned long` is not the same or higher rank than `size_t`, so this is probably not the best advice unless you just want to ignore (silently print the wrong value) the possibility of very large sizes. – R.. GitHub STOP HELPING ICE Apr 20 '18 at 14:22
  • @R.. I think I exactly stated when it's possible to print a wrong value. `unsigned long` is required to have *at least* 32 value bits, so can always hold any value up to 4G. I could state this explicitly in my second condition for wrong results. IMHO, this is good enough for **most** real-world usecases of printing a size. Of course not for all. –  Apr 20 '18 at 14:28
  • Indeed you did mention it and my comment didn't reflect that well. Main thing I wanted to point out is that Win64 is the unique case where bullet point 1 fails. – R.. GitHub STOP HELPING ICE Apr 20 '18 at 14:31
  • @R.. I added more details, so every reader can better judge by himself whether this approach is safe or risky for his usecase. –  Apr 20 '18 at 14:34
  • Why not use `printf("x=%llu", (unsigned long long)x);` – chqrlie Apr 20 '18 at 16:32
  • @chqrlie because this would require a C99-conforming standard library as well ... then you could have used `%zu` in the first place. –  Apr 20 '18 at 16:37
  • 1
    @chqrlie for windows, there's a better solution: Microsoft has own format specifiers and they are present in every version, up to 64 bits. The headers that come with mingw map them to the `PRI*` macros in `inttypes.h`. So you **could** convert to `uint64_t` and use these macros -> standard conforming and no truncation. –  Apr 20 '18 at 16:46
  • @chqrlie reminded me of an answer I wrote specifically for mingw / MSVCRT.DLL -- there is a better way in this special case, linking it here. –  Apr 20 '18 at 16:51
1

When "%zu" is not implemented, the alternative is to cast to some wide type and print that, with a modest risk of truncation.

size_t sz = foo();
printf("%lu\n", (unsigned long) sz);  // risk of truncation.

Code could attempt other integer wide types like uintmax_t and unsigned long long, yet if "%zu" is not implemented, then likely "%ju" and "%llu" will also not be implemented.

Truncation can be avoided with printing in parts.

printf("%lX%08lX\n", 
    (unsigned long) (sz/0x10000u/0x10000u), (unsigned long) (sz & 0xFFFFFFFFu));

// remote truncation risk remains.
printf("%lu%09lu\n", 
    (unsigned long) (sz/1000000000u), (unsigned long) (sz%1000000000u));

More complex code could be used to avoid leading digits.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • If `%llu` is not implemented, the system should be avoided altogether. Furthermore, the `printf` formats using 2 parts are somewhat incorrect. The code should only be used if `sizeof(long) > 4` and the decimal format will have at least one leading `0` and multiple ones if the value is smaller that `100000000`. – chqrlie Apr 20 '18 at 16:30
  • @chqrlie "*If* `%llu` *is not implemented, the system should be avoided altogether.*" <- `%llu` first appeared in C99 as well, so if you **have** `%llu`, it is **very** likely you have `%zu` as well. –  Apr 20 '18 at 16:43
  • 1
    C99 appeared 19 years ago... systems that still do not implement this should be retired. – chqrlie Apr 20 '18 at 16:56
  • @chqrlie too bold. Microsoft declared the system-wide installation of `msvcrt.dll` "private" for a long time and delivers newer/better runtime libraries with their compiler products, so, "problem solved" for them. Some don't like the bloat to redist a full standard library with every application written in C -- and although there were already plans to (optionally) replace `msvcrt.dll` with mingw, it can be considered a good thing to be able to compile lean dynamically linked C programs for windows. There really aren't **many** restrictions when using this ancient library. –  Apr 20 '18 at 17:01
  • 1
    @chqrlie Unclear about "The code should only be used if `sizeof(long) > 4`". Are you referring to `printf("%lu\n", (unsigned long) sz);` or `printf("%lX%08lX\n", ...` or both? Please detail the "somewhat incorrect" if it is something other than truncation in the first case or the leading 0 issue in the 2nd. Both those issues are noted in the answer. Perhaps text should have had more emphasis on those? The `> 4` part is especially unclear as any size/range limitation I'd expect to be due to `unsigned long` vs. `size_t` and not a fixed value of 4 (as in 4-byte types). – chux - Reinstate Monica Apr 20 '18 at 17:24
  • After rethinking the issue, I think splitting the printf format in 2 parts as not a good option, nor needed in practical cases: if `sizeof(size_t)>4`, then the type `unsigned long long` can be assumed to exist and to be at least as large as `size_t`, so `printf("%llu\n", (unsigned long long)sz);` will do the job. Otherwise `printf("%lu\n", (unsigned long)sz);` will handle the remaining cases correctly. Pathological architectures with non 8-bit bytes can be ignored for this argument IMHO. – chqrlie Apr 21 '18 at 13:48
1

I'd like to offer another approach to working with systems with that are not up to C99/C11 standards yet provide 64-bit or wider types.

Import and include a stdint.h/inttypes.h designed to bridge older systems to new C99 standards.

Example: C99 stdint.h header and MS Visual Studio

Then cast to a wide type available though them

#if SIZE_MAX > ULONG_MAX
// Include from the standard location or wherever the imported included files are saved.
#include <stdint.h>
#include <inttypes.h>

void printzu(){
    size_t x = 666;
    printf("x=%" PRIuMAX "\n", (uint_max_t) x);
}

#else
void printzu(){
    size_t x = 666;
    printf("x=%lu\n", (unsigned long) x);
}
#endif
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256