3

I was learning from geeksforgeeks and when i saw this, a thought came in my mind that as getchar() and other similar functions returns an int(whether due to failure of program or EOF) then why format specifier used is %c, why not %d(or %i).

// Example for getchar() in C
#include <stdio.h>
int main()
{
   printf("%c", getchar());
   return 0;
}

I know that the character we input in is a character and it is taken by the function getchar() and we are displaying it using %c.

But my problem is what's actually happening inside getchar() and printf() during the whole process, where getchar() is returning that integer, where it is getting stored and how the character we inputted is getting displayed by printf() i.e.what's happening inside printf()?

I did some research on printf() implementation and get to know that printf is part of the C standard library (a.k.a. the libc) and is a variadic function(printf) but i didn't came to know that what is really happening inside this function and how it knows by format specifier that it has to print character or int?

Please help me learning the whole detailed process which is going on.

Alfran
  • 1,301
  • 1
  • 10
  • 19
  • `%d` or `%i` are *different sizes* on the machine. They specify *integers*, typically *32bits* on x86. The character specifier `%c` expects a single byte. – Rafael Jun 06 '17 at 13:36
  • Possible duplicate of https://stackoverflow.com/questions/18279908/how-does-printf-work-internally – unalignedmemoryaccess Jun 06 '17 at 13:36
  • In musl, for example, `printf` forwards its arguments to `vfprintf`. Here's the source code https://github.com/BlankOn/musl/blob/master/src/stdio/vfprintf.c, it may help you understand – Federico klez Culloca Jun 06 '17 at 13:37
  • 1
    @tilz0R i guess you haven't read my question , the link you mentioned is simply defining the implementation of printf(). – Alfran Jun 06 '17 at 13:38
  • @Alfran it is clear that you did not even tried Google check. – unalignedmemoryaccess Jun 06 '17 at 13:38
  • @tilz0R What? I am researching on this topic since 3 days. how is it clear to you? – Alfran Jun 06 '17 at 13:41
  • @Rafael: Note that when you pass a `char` (or indeed a `short` on most machines) to `printf()`, it is promoted to an `int`. So, the `%c` format expects an `int` — but it will normally be the `int` value corresponding to an (unsigned) `char` promoted to `int`. Which is what `getchar()` returns, after all — that, or EOF. The `%d` and `%i` conversion specifications also work because the value passed via the 'variable arguments' mechanism is indeed an `int`. – Jonathan Leffler Jun 06 '17 at 13:56
  • @JonathanLeffler Please tell me suppose in my above code i entered char 'a' then what int getchar() will return? 97? – Alfran Jun 06 '17 at 14:11
  • On a typical system, `getchar()` will return 97 when you type `a`. It will return a value in the range 0..255, or EOF, where EOF will be negative (it's usually -1, but that's not formally guaranteed). If the machine uses [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC) as its native code set (IBM mainframe), the value might be 129 (hex 0x81) for `a`. That's why it's a good idea to write `'a'` in the source and not 97 — the value for `a` isn't always 97. – Jonathan Leffler Jun 06 '17 at 14:21
  • 1
    @Rafael that is incorrect; `%d`, `%c` and `%i` all expect an `int` argument – M.M Jun 06 '17 at 14:22
  • @M.M What %c expects int? I guess it expects the value of character itself which is int?? – Alfran Jun 06 '17 at 14:33
  • @JonathanLeffler Thanks man! That helps in clarifying my some concepts. – Alfran Jun 06 '17 at 14:35
  • 'Explain whole process' oh, I don't think so. Aaprt from anything else,your OS is heavily involved. If you're running MS Windows, it's not even possible to give details unless you have an MS and/or partner NDA. Ridiculously broad question:( 'I am researching on this topic since 3 days' - did you mean 'years' or 'decades'? – ThingyWotsit Jun 06 '17 at 14:46
  • gentlemen, I understand that chars are promoted to ints, that is besides the point imo. The number specifier have to be converted to their ascii representations. The chars dont. They are expected to already be their ascii equiv. I was trying to explain it easier to the beginner that the differences are necessary, and not boast my intellect to him by bogging him in the details. Thanks for trying to enlighten me though. – Rafael Jun 06 '17 at 15:23
  • You should try to make things layman to the newbies not boast yourselves. Every time, I try to dumb things down for them, I always get the white knights...totally missing the SO principle. – Rafael Jun 06 '17 at 15:29
  • @ThingyWotsit I asked for the concept generally and for that i don't have to explain about my O.S. and the people understood well and answered it. Moreover, the answer I marked explained it well. If you can't understand such `ridiculously broad questions` then you also don't have to stress too much and google the meaning of `days` if you are confusing it with `years or decades` . Keep calm and Don't Stress! ;) – Alfran Jun 06 '17 at 15:32

3 Answers3

2

(I am supposing your computer has an x86-64 processor and runs Linux)

why format specifier used is %c, why not %d(or %i).

Imagine that the corresponding argument (to printf) was 99 (an int). If you use  %c then the letter c (of ASCII code 99) is displayed. If you use %d or %i then 99 is displayed by printf, etc...

printf is, as you noticed, a variadic function. It is implemented using variadic primitives like va_start and va_end which are macros expanded to some builtin known to the compiler. How exactly arguments are passed and results are given (the calling convention) is defined (in some processor & OS specific way) in a document called ABI (application binary interface).

On some C standard library implementations, printf (and related functions, like vfprintf) would ultimately use putc or something related.

Notice that standard I/O functions (those in <stdio.h>) are likely to be provided with the help of some operating system. Read Operating Systems : Three Easy Pieces for more about OSes.

Quite often, the C standard library will use some system calls to interact with the operating system kernel. For Linux these are listed in syscalls(2), but read Advanced Linux Programming. To output some data the write(2) syscall would be used (but the C standard library is generally buffering, see setvbuf(3)).

BTW, for Linux/x86-64 both GNU glibc & musl-libc are free software implementations of the C standard library, and you can study their source code (most of it is coded in C, with a tiny bit of assembly for the system call glue).

But my problem is what's actually happening inside getchar() and printf() during the whole process, where getchar() is returning that integer, where it is getting stored ...?

The ABI defines that the result of an int returning function goes thru register %rax, and getchar (like every other int return function) works that way. See the X86-64 Linux ABI referenced here.

... and how the character we inputted is getting displayed by printf() i.e. what's happening inside printf()?

After many software layers, when the stdout stream gets flushed (e.g. by some call to fflush(3), by a \n newline character, or at exit(3) time, including returning from main into crt0 code), the C standard library will use the write(2) syscall. The kernel will process it to show something (But details are horribly complex, read first the tty demystified). Actually millions of source code lines are involved (including inside the kernel - read about DRM, inside the display server such as X.Org or Wayland - also some code inside the GPU -, inside the terminal emulator). Linux is free software, so in principle you can study all of it (but that needs more than a lifetime, a typical Linux distribution has about twenty billions lines of source code). See also OSDev wiki which gives some practical information, including about native Intel grapĥics (which are the most primitive graphics today).

PS. You need to spend more than ten years understanding all the details (and I don't).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • I know that part, but it's not explaining my question? Where getchar() return value is going and how a character is getting displayed? PS: Reposting my question here: what's actually happening inside getchar() and printf() during the whole process, where getchar() is returning that integer, where it is getting stored and how the character we inputted is getting displayed by printf() – Alfran Jun 06 '17 at 13:43
  • 1
    "Where `getchar()` return value is going? This is defined by the calling convention and ABI. Generally it goes in some register `%rax` for Linux/x86-64. But there is no specificity for `getchar()`: all `int` returning functions work the same for giving their result – Basile Starynkevitch Jun 06 '17 at 13:46
  • 1
    @alfan, that's implementation-dependent, but a simple way is that the result of `getchar` is pushed onto a stack from which `printf` takes its arguments – Federico klez Culloca Jun 06 '17 at 13:46
  • 1
    @Alfran: what do you not understand? But beware, you may need to read several entire books to understand the details. – Basile Starynkevitch Jun 06 '17 at 13:53
  • @FedericoklezCulloca: On Linux/x86-64 calling conventions use registers (for the first 6 arguments and for the result), not the stack. It could differ for variadic functions. – Basile Starynkevitch Jun 06 '17 at 13:55
  • @BasileStarynkevitch thanks for those above details, i am grateful. BTW please tell me those books which clarify these basic concepts, i would love to dive in them :) – Alfran Jun 06 '17 at 14:25
  • @FedericoklezCulloca What about the arguments if count > 6 – Alfran Jun 06 '17 at 14:27
  • The links in italics are books. – Basile Starynkevitch Jun 06 '17 at 14:28
  • When the arguments are more than 6, the ABI specify how they are passed on the call stack. – Basile Starynkevitch Jun 06 '17 at 14:29
  • @Alfran: how many years are you willing to spend understanding all the details? I gave you enough links to eventually reach them... But you'll need a lot of time. – Basile Starynkevitch Jun 06 '17 at 14:42
  • @BasileStarynkevitch it would take a month or so, as i already read operating system by Galvin which was enough for basics and still, i am a quick learner :) – Alfran Jun 06 '17 at 14:49
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/145988/discussion-between-basile-starynkevitch-and-alfran). – Basile Starynkevitch Jun 06 '17 at 14:49
2

To understand you must read default argument promotions.

where getchar() is returning that integer, where it is getting stored

C handle this for you.

how the character we inputted is getting displayed by printf()

%c tell to printf() to print a character.

Stargateur
  • 24,473
  • 8
  • 65
  • 91
2

The man page for getchar says the following:

fgetc() reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error.

getc() is equivalent to fgetc() except that it may be implemented as a macro which evaluates stream more than once.

getchar() is equivalent to getc(stdin).

So let's say you enter the character A. Assuming your system uses ASCII representation of characters, this character has an ASCII value of 65. So in this case getchar returns 65 as an int.

This int value of 65 returned by getchar is then passed to printf as the second argument. The printf function first looks at the format string and sees the %c format specifier. The man page for printf says the following regarding %c:

If no l modifier is present, the int argument is converted to an unsigned char, and the resulting character is written.

So printf reads the next argument as an int. Since we passed in a int with value 65, that's what it reads. That value is then cast to an unsigned char. Since it is still in the rage of that type, the value is still 65. printf then prints the character for that value. And since 65 is the ASCII value for the character A, the character A is what appears.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • `No l modifier is present` means? Can you explain please? – Alfran Jun 06 '17 at 14:16
  • 2
    @Alfran Format specifiers to `printf` can have a modifier in front of it. For example, `%d` expects an `int` to print, while `%ld` expects a `long int` and `%lld` expects a `long long int`. In your case, there is no modifier. See the [man page](https://linux.die.net/man/3/printf) for more details. – dbush Jun 06 '17 at 14:19
  • okay! So , if am getting it correct then if an int is passed and the specifier is %ld then it converts int to long int? Similarly, as you explained the int is passed in %c and getchar() 's return value which is int is typecast to it's corresponding ASCII value of char? – Alfran Jun 06 '17 at 14:45
  • 1
    @Alfran The format specifier and modifier do not convert the parameter, they specify (among other things) what type is accepted. Both `%c` and `%d` expect an `int` parameter, but (if 65 is passed in) one will output `A` while the other will output `65`. If the type of the parameter does not match the expected type of the format specifier, you get [undefined behavior](https://en.wikipedia.org/wiki/Undefined_behavior). – dbush Jun 06 '17 at 14:49
  • @Alfran The reason that `%c` internally casts the given `int` to `unsigned char` is 1) the default promotion rules will automatically convert a `char` to an `int` when used in an expression or passed to a variadic function like `printf` and 2) non-wide characters are a single byte, so casting to `unsigned char` gets rid of values stored in all but the lowest order byte of the passed in `int`. – dbush Jun 06 '17 at 14:58
  • Ohh! I got it. So only way to type caste is Explicitly writing in code like `int x = 10 long int y = (long int)x` About that undefined behaviour . But, if `%c` expects an int but the size of int is 4 byte in my machine and char is of 1 byte only. So `%c` can take maximum value upto `4 byte`? or it truncate it or what? – Alfran Jun 06 '17 at 15:01
  • 1
    @Alfran Yes, if an `int` is 4 bytes, the `%c` format specifier tells `printf` to read the next 4 bytes as an `int`. Internally, it then does something like `unsigned char c = (unsigned char)intval;` to truncate what it read and use that value to print the ASCII code corresponding to the given value. – dbush Jun 06 '17 at 15:05