Why does itoa expect a signed character instead of an unsigned?

Question

Learning embedded C while working in MPLAB X with a PIC24FJ128GB204.

So far, I've mostly heard that you should use unsigned types as much as possible (especially?) on embedded devices, so I've started to use uint8_t arrays to hold strings. However, if I call itoa from stdlib.h, it expects a pointer to a signed char (int8_t) array:

extern char * itoa(char * buf, int val, int base);

This is made specifically clear when I try to compile after using itoa on an unsigned array:

main.c:317:9: warning: pointer targets in passing argument 1 of 'itoa' differ in signedness
c:\program files (x86)\microchip\xc16\v1.36\bin\bin\../..\include/stdlib.h:131:15: note: expected 'char *' but argument is of type 'unsigned char *'

Searching for implementations of itoa on other platforms, that seems to be the common case.

Why is that?

(I've also noticed that most implementations expect value/pointer/radix whereas -for some reason- the stdlib.h from Microchip expects the pointer first. It took me a while to realize this.)

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/208156/discussion-on-question-by-dieter-vansteenwegen-on4dd-why-does-itoa-expect-a-sign). — Samuel Liew, Feb 19 '20 at 23:43

score 6 · Answer 1 · edited Feb 18 '20 at 16:39

6

char as signed or unsigned is a compromise of decades ago - It made sense then to bring a level of consistency to compilers of the day.

itoa(), although not a standard C library function, follows that convention, in that the string is made up of char.

Many library functions use a string pointer. itoa() does too and handles the internal workings as unsigned char. Keep in mind, a string is to represent text, not numbers - so the signedness of the char in itself is not a great concern. Of course the point of itoa() is to take a number (int) and form a string.

The C library treats char functionally "as if" it were unsigned char in many cases.

int fgetc() returns a value of EOF or in the unsigned char range.
printf() "%c": "the int argument is converted to an unsigned char, and the resulting character is written."
<string.h> "For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value)."
<ctype.h> "In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF.

edited Feb 18 '20 at 16:39

Toby Speight

27,591
48
66
103

answered Feb 18 '20 at 15:27

chux - Reinstate Monica

143,097
13
135
256

"The C library treats char functionaly "as if" where unsigned char ." Still the compiler warns me ``` expected 'char *' but argument is of type 'unsigned char * ```. Since an unsigned char is clearly not what is expected of me, I understand that as the function expects a signed variable. Is that a wrong conclusion? – Dieter Vansteenwegen ON4DD Feb 18 '20 at 15:32
1

@DieterVansteenwegenON4DD `char` isn't a signed type. It's a distinct type from `signed char` and `unsigned char` and can be either signed or unsigned, which is unimportant because you're using it for representing characters, not integer values – phuclv Feb 18 '20 at 15:36
@DieterVansteenwegenON4DD " I've started to use uint8_t arrays to hold strings". In C a _string_ is "... is a contiguous sequence of characters terminated by and including the first null character.". If the character type or your string is `signed char`, `unsigned char`, or `char`, the standard string functions perform the same way. (as if "unsigned char"), yet the function can only have 1 signature. The compromise is `char`. Use `char *` for strings or, less preferable, cast to `(char*)` – chux - Reinstate Monica Feb 18 '20 at 15:36

score 4 · Answer 2 · edited Jun 20 '20 at 09:12

4

So far, I've mostly heard that you should use unsigned types as much as possible (especially?) on embedded devices,

Have the people you heard this from explained why? Is that explanation grounded in solid analysis and engineering, or is it pulled out of thin air?

The problem with rules of thumb is that they often get applied unthinkingly in the wrong situation. Use unsigned types when you need to use unsigned types, use signed types when you need to use signed types.

I've started to use uint8_t arrays to hold strings.

Don't. That's not what it's there for.

Plain char may be signed or unsigned, depending on the environment. The character encodings for the basic character set (upper- and lower-case Latin alphabet, decimal digits, and the basic set of graphical characters) are always going to be non-negative, but extended characters may have positive or negative encodings.

6.2.5 Types
...
3 An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

^{C 2011 Online Draft}

The C library functions that handle strings expect pointers to char, not unsigned char or uint8_t or anything else. While it's highly likely that for any platform that offers it uint8_t is simply a typedef name for unsigned char, that's not a guarantee. char must be at least 8 bits wide, but there are platforms where it could be wider (one of the old PDPs used 9-bit bytes and 36-bit words, and depending on the application I can see some special-purpose embedded systems using wonky sizes).

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 18 '20 at 15:35

John Bode

119,563
19
122
198

Don't forget that `char` may be EBCDIC, or ASCII, or UTF8, or UTF16, or... Anyone that has to care if a compiler treats `char` as signed or unsigned will also have to care if the character encoding they want actually matches the character encoding used by the compiler. E.g. if you get raw bytes from network and have to convert them to "implementation defined char" then... – Brendan Feb 18 '20 at 15:47
1

"The problem with rules of thumb is that they often get applied unthinkingly in the wrong situation. Use unsigned types when you need to use unsigned types, use signed types when you need to use signed types." Yeah well, I'd rather have newbies code correct first, then ask why later. Because it is going to take a while to explain all of the implicit type promotions, all the hiccups associated with bitwise operators on signed types, all the details about compiler-types picked for integer constants (dec or hex), the whole signedness madness of C allowing 1's compl + signed magnitude. And so on. – Lundin Feb 18 '20 at 16:00
I'd much rather have the newbies do as they are told: "here's a rule of thumb, use it until you know better". If they use critical thinking and question the rule of thumb, great... but they'll still end up using it in the end. – Lundin Feb 18 '20 at 16:02
@Lundin Thank you very much. Indeed, I try to be correct which results in following advice without really understanding all the details yet. It seems the conclusion is "use unsigned where possible, but for text use char without signedness" – Dieter Vansteenwegen ON4DD Feb 18 '20 at 16:14
@Lundin I'd argue that only using unsigned isn't inherently safer, and it shouldn't be taught this way. Unsigned might be the correct choice for bit manipulation, but embedded systems these days do way more than that. If an unskilled programmer is told to always use unsigned, it won't take long until they accidentally produce an overflow by subtracting two unsigned numbers ^^ – Felix G Feb 18 '20 at 16:14
@FelixG If you get any form of overflow, you picked the wrong _size_ for the variable and signedness won't save you. However, overflow (or rather wrap-around) of unsigned types is well-defined and safe, overflow on signed types is undefined behavior. Yet another reason not to use signed types unnecessary. – Lundin Feb 18 '20 at 16:17
"Don't. That's not what it's there for." Except when you _need_ to use `uint8_t` to hold strings! :) It's not uncommon in embedded systems that one needs to generate a symbol table for a LCD, in which case you basically need to invent an ASCII table yourself, in which case `uint8_t` makes the most sense since that table is to be regarded as a raw chunk of binary data. – Lundin Feb 18 '20 at 16:19

score 2 · Accepted Answer · answered Feb 18 '20 at 15:52

2

So far, I've mostly heard that you should use unsigned types as much as possible (especially?) on embedded devices

This is true mainly for the reason that (accidentally or intentionally) signed operands mixed with the bitwise operators create havoc. But also there aren't many cases in low level programming where you actually need to use signed types.

For example, MISRA-C enforces you to always use unsigned variables, operands and integer constant unless the intention is to actually use a signed type. So this isn't just something opinion-based, MISRA-C is de facto industry standard for most professional embedded systems.

so I've started to use uint8_t arrays to hold strings

That's ok but it isn't wrong to use char for that purpose either. The only time when it is ok to use char is when you intend to store text. Note that char is especially nasty, because unlike all other types in the language, it has unknown signedness. Each compiler can make char either signed or unsigned and still conform with the C standard. So code relying on char being either signed or unsigned is broken. However, for text strings this doesn't matter since they are always positive.

However, if I call itoa from stdlib.h, it expects a pointer to a signed char (int8_t) array:

Your compiler apparently treats char as signed then. First of all please note that itoa isn't standard C and isn't allowed to exist inside stdlib.h when strict C standard conformance is desired. But more importantly, different compilers might implement the function differently since it isn't standardized.

As it turns out, you can safely cast wildly between the various character types: char, unsigned char, signed char, int8_t and uint8_t (the stdint.h 8 bit types are pretty much dead certain to be character types even though the standard doesn't say so explicitly). The character types specifically have various special rules associated with them, meaning that you can always cast something to a character type.

You can safely cast your uint8_t array to a char*, as long as there are no qualifiers (const etc) present.

answered Feb 18 '20 at 15:52

Lundin

195,001
40
254
396

Ok, I understand most of this (though not on a very deep level). I did not see why a variable holding a character could need a negative number, so the logic was to use unsigned. Another question I asked on SO had a reply that casting was something that really should be avoided unless you really knew better than the compiler what was happening, so I have tried to avoid doing it and strictly define variables as either signed or unsigned. Additionally, I was adviced to use stdint.h on embedded systems, hence the uint8_t choice. From now on I will use char (without signed/unsigned) for text... – Dieter Vansteenwegen ON4DD Feb 18 '20 at 16:03
@DieterVansteenwegenON4DD "casting was something that really should be avoided unless you really knew better than the compiler" That's a sound rule, but in this specific case we happen to know better than the compiler ;) `char*` and `uint8_t*` aren't necessarily pointers to compatible types, so the compiler is right to be concerned. However, the character types specifically has special rules allowing the data stored in a character type to safely get converted to a different character type. – Lundin Feb 18 '20 at 16:09
(The "special rules" being uninteresting language lawyer stuff like "no padding bits", "no trap representations", compatible "effective type", plus a special pointer conversion rule when going from pointer-to-object to pointer-to-character.) – Lundin Feb 18 '20 at 16:11
1

@DieterVansteenwegenON4DD Using `uint8_t` and stdint.h _is_ the correct choice for embedded systems. I use it for text too now and then, but `char` tends to be more painless for text specifically, because it doesn't produce warnings from tools like the kind you got here. – Lundin Feb 18 '20 at 16:13
"You can safely cast your uint8_t array to a char*, as long as there are no qualifiers (const etc) present." I was also advice to use const as qualifier for all variables that should not be changed. Is there a reason why I shouldn't cast from const to const? – Dieter Vansteenwegen ON4DD Feb 18 '20 at 16:18
@DieterVansteenwegenON4DD You shouldn't "cast away" `const` (or `volatile`) but it's perfectly fine to go from non-const to const - but then if the code is correct, a cast is not necessary. – Lundin Feb 18 '20 at 16:20
Citation needed for "*text strings [...] are always positive*". If `char` is signed, then the characters in strings may be positive or negative. – Toby Speight Feb 18 '20 at 16:42
@TobySpeight Yeah citation needed regarding the existence of a symbol table with negative indices. Not even EBCDIC was that bad, but if you have discovered an even worse one, do let us know... – Lundin Feb 18 '20 at 21:26
@TobySpeight True that a _string_ may contain negative elements. Yet without escapes or implementation defined behavior, string literals like `"Hello"` always have non-negative elements as the coding characters are positive. – chux - Reinstate Monica Feb 19 '20 at 00:38
@chux, does that mean that no ECBDIC platform can have 8-bit signed `char`? Or just that they happen not to? – Toby Speight Feb 19 '20 at 08:08
@TobySpeight The _only_ way a 8 bit signed char only used for the purpose of storing text can end up negative, is if the symbol table itself has negative indices. If someone does crazy arithmetic on it or assign integers to it, that's another story. – Lundin Feb 19 '20 at 08:28
@Lundin, what "symbol table"? I don't see that defined in C. I'm talking about _characters_ - e.g. letter 'A' is 0xC1 in EBCDIC; that's **193** as an unsigned character, or **-63** as a signed 8-bit character. Exactly the same way that 'Á' is -63 on a Latin-1 system with signed 8-bit `char`. – Toby Speight Feb 19 '20 at 09:09
@TobySpeight The C standard calls it _the basic character set_ and _the extended character set_, where the latter is the well-defined basic characters + implementation-defined ones. It doesn't specify what values these will hold. As for your example, why would a compiler implementing EBCDIC use signed characters? That would be a very dumb compiler decision, given that the `'A'` will simply result in a table look-up internally in the compiler. The compiler is very unlikely to be written in a magic fictional language with negative array indices, and very likely to be written in C. – Lundin Feb 19 '20 at 09:16
@Lundin, I don't know any compilers for EBCDIC systems well enough to know whether they all have unsigned char (hence my question to chux). But it's certainly legal for them to use signed char, and may have happened when an existing compiler was re-targeted. I'd genuinely like to know (but that may be a question for [retrocomputing.se]). I'm certainly not postulating a "magic fictional language". I've no idea where you think string literals are likely to require some sort of table lookup of their constituent characters. – Toby Speight Feb 19 '20 at 09:25
@TobySpeight The C standard generally doesn't prevent you from creating an "ISO compliant binary-barfer". Real world use is what actually drives quality of implementation. Few people are interested in a compiler that nobody will use. As for table look-ups, that's the only possible way. You can't squeeze an actual letter 'A' into the RAM memory, the op code must hold the binary representation for that character. – Lundin Feb 19 '20 at 10:42
@TobySpeight With 8-bit `char` and EBCDIC with its basic character set having 0xF0 or the digit `'0'`, prevent a _signed_ `char`. With ASCII, sign-ness goes either way. – chux - Reinstate Monica Feb 19 '20 at 14:55
@chux I'm still struggling to find the part in the standard that says that basic character set are all non-negative. There's no such language in C11 section 5.2.1 that I can see. Can you point me to the specific location that says that these values can't be negative? Thanks. – Toby Speight Feb 19 '20 at 15:40
@TobySpeight "If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative." C17dr§ 6.2.5 3 might do. – chux - Reinstate Monica Feb 19 '20 at 15:44
2

Thanks @chux, that's what I was missing! – Toby Speight Feb 19 '20 at 15:48

0___________ · Answer 4 · 2020-02-18T15:15:33.033

1

So far, I've mostly heard that you should use unsigned types as much as possible

Firstly - it is not the truth at all - you should use the correct type. What is the correct type? It is the type which suits the best your needs. How can I know which type is best for me? That depends what you are use it for. It should have a type to store all possible values your program might want to store in it.

So you should not listen this person anymore.

edited Feb 18 '20 at 15:15

answered Feb 18 '20 at 15:11

0___________

60,014
4
34
74

2

So, why would one use a signed 8 bit variable over an unsigned to store characters in an array? Trying to learn here, "common sense" is not possible without knowledge... Maybe I misunderstood or that person only meant specific use cases... – Dieter Vansteenwegen ON4DD Feb 18 '20 at 15:13
If you're playing around with data that doesn't necessarily represent a string. For example, a byte buffer that receives data from a another device. Sometimes it might have string data, but other times it might have raw bytes. In this instance, I'd keep it as a uint8_t* , and cast it to char* when I know it'll have the data I'm expecting. – yhyrcanus Feb 18 '20 at 15:29
@DieterVansteenwegenON4DD ones use a `char` array which consists of a series of bytes and doesn't care about its signness. Just leave it up to the compiler to choose whatever suitable. For example [unsigned char is more efficient in ARM](https://stackoverflow.com/q/3093669/995714), therefore `char` is usually unsigned on that architecture – phuclv Feb 18 '20 at 15:40
The problem with signed types pops up when you get accidentally signed operands, caused by implicit promotions etc. Avoiding signed types unless they are explicitly needed is pretty much embedded industry standard. – Lundin Feb 18 '20 at 15:40

Why does itoa expect a signed character instead of an unsigned?

4 Answers4