2

I think that we can guarantee that the call to foo,

int foo (FILE * f, int i)
{
    return fprintf(f, "%i", i);
}

will never produce an an encoding error if we can guarantee that i is not a trap representation (N2176, Representation of Types: General) because the characters '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', and '9' are found in the basic execution character set (N2176, Environmental Considerations: Character Sets), and these are all that are needed to represent the decimal form of any integral value that can be represented by an int. Also, the number of characters needed for the decimal representation of an int is guaranteed to be less than INT_MAX (so the return value of fprintf can always store the number of characters produced by the conversion).

So, the question reduces to:

How do we guarantee that an int does not store a trap representation?

(Or is it more complex?)

It's more complex.

This section has been added in response to the answers and comments (regarding orientation ) thus far (2022-02-18).

I think that we can guarantee that calls to fputi and / or fwputi,

#include <stdio.h>
#include <wchar.h>

int fputi ( int i , FILE * f )
{
    return fwide ( f , 0 ) <= 0
        ?  fprintf ( f ,  "%i" , i )
        : fwprintf ( f , L"%i" , i ) ;
}

// The only differences between fputi and fwputi should be:
// - their names and
// - the "<" or "<=" signs.

int fwputi ( int i , FILE * f )
{
    return fwide ( f , 0 ) < 0
        ?  fprintf ( f ,  "%i" , i )
        : fwprintf ( f , L"%i" , i ) ;
}

will never produce an an encoding error if we can guarantee that i is not a trap representation (N2176, Representation of Types: General) because the characters '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', and '9' are found in the basic execution character set (N2176, Environmental Considerations: Character Sets), and these are all that are needed to represent the decimal form of any integral value that can be represented by an int. Also, the number of characters needed for the decimal representation of an int is guaranteed to be less than INT_MAX (so the return value of fprintf or fwprintf can always store the number of characters produced by the conversion).

So, the question reduces to:

How do we guarantee that an i does not store a trap representation? Or, how do we modify fputi and fwputi so that they behave portably (i.e., consistently for all implementations that adhere to the Standard)? Maybe insert before the return statements:

// C-LIKE PSEUDO CODE
if ( isTrapRepresentation ( i ) ) exit ( EXIT_FAILURE ) ;
chqrlie
  • 131,814
  • 10
  • 121
  • 189
Ana Nimbus
  • 635
  • 3
  • 16
  • What is an "encoding error"? – KamilCuk Feb 16 '22 at 23:22
  • 1
    @KamilCuk "An encoding error occurs if the character sequence presented to the underlying `mbrtowc` function does not form a valid (generalized) multibyte character, or if the code value passed to the underlying `wcrtomb` does not correspond to a valid (generalized) multibyte character" (N2176, Files; reformatted). – Ana Nimbus Feb 16 '22 at 23:24
  • You would need to know what a trap representation looks like (if one exists at all) and check for it. – dbush Feb 16 '22 at 23:26
  • But it's `fprintf`. You never know what `fprintf` does behind the scenes, does it call anything, or not. So `How can we guarantee ...` we can contact the author of `fprintf` and make him release a statement. `will never produce an an encoding error if we can guarantee that i is not a trap representation` Why? Why would trap representation make `fprintf` cause an encoding error? Will it call `mbrtowc` then? How do you know? Also, `fgetwc` would call `mbrtowc`, why would `fprintf` call it? – KamilCuk Feb 16 '22 at 23:26
  • Hmm, If the _orientation_ of `f` was established as wide before the `fprintf()` call, I'd expect an _encoding error_ is readily possible with this narrow call. Review § 7.21.2 4 – chux - Reinstate Monica Feb 16 '22 at 23:50
  • 1
    You cannot detect a trap representation in a portable way. You need to consult your compiler documentation. – n. m. could be an AI Feb 17 '22 at 07:06
  • @KamilCuk Re "Why would trap representation make `fprintf` cause an encoding error?" It seems to me that if `i` is a trap representation, anything could happen (undefined behavior), including the behavior that `fprintf` may report an encoding error if `i` is a trap value. Re "how do you know:" the point is that I don't know, which is why I want to eliminate that case. – Ana Nimbus Feb 18 '22 at 16:11
  • My reasoning is, that it is irrelevant if it is a trap representation or not, because `fprintf` can generate encoding error whenever it wants to. While it is true that _when_ `i` is a trap representation, then anything could happen, but this is not a restriction. `fprintf` can generate encoding error unrelated to `i` being trap representation or not. You should contact the author of `fprintf` first and make him release a statement "when `i` is not a trap representation, then I guarantee that it will work". – KamilCuk Feb 18 '22 at 16:24
  • @KamilCuk Re: "`fprintf` can generate [an] encoding error whenever it wants to." This is a sad surprise to me. Can you point me to some language in the Standard that support this? – Ana Nimbus Feb 18 '22 at 16:27
  • I can't - it is not specified. Standard specifies behavior. There is no specification - "if inputs are valid, then there will be no errors" related to `fprintf`. That's my point that such specification does not exist. – KamilCuk Feb 18 '22 at 16:28
  • It really is a question, are you asking from the perspective of a "language-lawyer" or practical? Because practically, there are no trap representation of `int` (everything is twos-complement) and you will get no encoding error except for wide/normal streams mix. – KamilCuk Feb 18 '22 at 16:37
  • @KamilCuk Please clarify, is it a) "`fprintf` can generate encoding error whenever it wants to?" Or b) "you will get no encoding error" because 1) "everything is two-s complement" and 2) `fputi` and `fwputi` don't mix orientations? – Ana Nimbus Feb 18 '22 at 17:14
  • I'd expect an encoding error, or _some error_ is possible in extreme cases like `frpintf(stream, "%s", s);` when `strlen(s) > INT_MAX`. Yet since `fprintf()` already has environmental limits that were exceeded, that may fall under UB. Of course `"%d"` will not make such an excessively long output. – chux - Reinstate Monica Feb 19 '22 at 11:22

2 Answers2

2

I think that we can guarantee that the call to foo,... will never produce an an encoding error if we can guarantee that i is not a trap representation

Counter example:

#include <stdio.h>
#include <wchar.h>
int main() {
  int i = fwprintf(stdout, L"Hello world!\n");
  i = fwprintf(stdout, L"Step A %d\n", i);
  i = fwprintf(stdout, L"Step B %d\n", i);
  i = fprintf(stdout, "Step C %d\n", i);  // Returns -1
  fwprintf(stdout, L"Step D %d\n", i);
}

Output

Hello world!
Step A 13
Step B 10
Step D -1

Once a stream sets it orientation (print to wide or narrow characters), carelessly trying to print the other way results in an encoding error, even without a trap value in i.


How can we guarantee that fprintf(f, "%i", i) will never result in an encoding error?

At a minimum, do not change orientation.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • To compile without warnings, the example needs an `#include `. Without this include there's an implicit declaration warning about `fwprintf()`. – Nathan Mills Feb 17 '22 at 19:34
  • @NathanMills Include added. – chux - Reinstate Monica Feb 17 '22 at 20:28
  • It seems to me that the original question may not have an answer, but +1 for helpfulness. I've edited the original post in attempt to move closer to something useful (to me). – Ana Nimbus Feb 18 '22 at 16:04
  • Amazing! Blatant proof that wide characters should be avoided completely. – chqrlie Feb 18 '22 at 19:12
  • @AnaNimbus [I've edited the original post in attempt to move closer to something useful (to me)](https://stackoverflow.com/questions/71150512/how-can-we-guarantee-that-fprintff-i-i-will-never-result-in-an-encoding/71153214?noredirect=1#comment125815716_71153214) --> a moving target question reduces one's interest as not sure when/where it stops. IMO, better to have posted another question. – chux - Reinstate Monica Feb 18 '22 at 21:42
  • @chqrlie IMO, time for ``. – chux - Reinstate Monica Feb 18 '22 at 21:44
2

How can we guarantee that fprintf(f, "%i", i) will never result in an encoding error?

We can contact the author of fprintf, distributor of our toolchain and/or compiler, or similar, and make that entity release a statement that it guarantees that fprintf(f, "%i", i) will not result in an encoding error.

will never produce an an encoding error if we can guarantee that i is not a trap representation

This is based on a false premise. There is no such guarantee. fprintf may produce an encoding error, anytime, anywhere.

How do we guarantee that an int does not store a trap representation?

We can create an array with all possible trap representations of a type. Then just check if the representation of i is one of the possible trap representations. For example:

static const unsigned char trap_representations_of_int[][sizeof(int)] = {
    // for example, on my imaginary architecture:
    // int has 3 bytes, architecture is big endian
    // and 0x00CAFE is a trap
    { 0x00, 0xCA, 0xFE, },
};
bool is_trap_representation_int(const int *num) {
    for (int i = 0; i < ARRAY_LEN(trap_representations_of_int); ++i) {
        if (memcmp(num, trap_representations_of_int[i], sizeof(int)) == 0) {
           return true;
        }
     }
     return false;
 }

unsigned char is guaranteed to not have a trap representation. You can inspect any data with it, also do ranges comparisons between bytes. The list of trap representation is something to be taken out from the architecture documentation.


On a practical note: there does not exist an architecture with int having any trap representations. I will leave that sentence here, until someone provides a counterexample, that would be fun to know.

And fprintf maybe will give you an encoding error when you mix wide/normal strings. When you use wide strings, you should be aware of problems, and you will work with the specific system you are targeting - it's specific to the environment anyway.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • 1
    https://stackoverflow.com/questions/30586217/are-there-any-implementations-that-support-a-negative-zero-or-reserve-it-as-a-t sort of has an example of an `int` with a trap representation. At least, an `int` value which behaves in a non-standard way, but could be comforming if that value is treated as a trap representation. It's not clear if that implementation is generally conforming otherwise, though. – Nate Eldredge Feb 18 '22 at 17:25
  • FYI: "there does not exist an architecture with int having any trap representations" --> An interesting read on some unusual, by todays' view, [machines](https://begriffs.com/posts/2018-11-15-c-portability.html). Don't see any integer trap representations. – chux - Reinstate Monica Feb 19 '22 at 11:56