72

I know I can use va_arg to write my own variadic functions, but how do variadic functions work under the hood, i.e. on the assembly instruction level?

E.g., how is it possible that printf takes a variable number of arguments?


* No rule without exception. There is no language C/C++, however, this question can be answered for both of them

* Note: Answer originally given to How can printf function can take variable parameters in number while output them?, but it seems it did not apply to the questioner

Community
  • 1
  • 1
Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
  • @BЈовић: Those are "guesses"; I will refine the text. – Sebastian Mach Apr 16 '14 at 09:07
  • The question is about how variadic functions work on a technical basis; how does it work w.r.t. to the hardware. And no, this is not a dupe. Have you downvoted the answer? // edit: I deleted my answer in the other thread. – Sebastian Mach Apr 16 '14 at 09:12
  • Typically one does not just _decide_ that their question shall be part of the [tag:c++-faq] tag. Is this really "frequently" asked? It's a good self-answered Q&A though so thanks for posting it. – Lightness Races in Orbit Apr 16 '14 at 09:18
  • 2
    @BЈовић: `You just copy&pasted the answer. So, this question is duplicate of other.` That is a non sequitur. Duplicate answers do not duplicate questions make. – Lightness Races in Orbit Apr 16 '14 at 09:19
  • @LightnessRacesinOrbit: I see. I expected something wrong and should have done my homework better. – Sebastian Mach Apr 16 '14 at 09:20
  • 2
    possible duplicate of [What is the format of the x86\_64 va\_list structure?](http://stackoverflow.com/questions/4958384/what-is-the-format-of-the-x86-64-va-list-structure) – Matthieu M. Apr 16 '14 at 09:27
  • 1
    @MatthieuM.: I am not sure if that is "technical" enough. I will refine my question. – Sebastian Mach Apr 16 '14 at 09:31
  • 2
    @phresnel: it seems more technical (or at least precise) than your own answer, although it is specialized for one architecture. – Matthieu M. Apr 16 '14 at 09:34
  • @MatthieuM.: Yeah, your comment made me realise that "technical" is ambiguous, which is why I now added "on the instruction level", which I realise needs refinement, too. // I am not sure if this is really a C or C++ question anymore. It "seems" C only serves as an example. Yet it might be interesting for C programmers searching for enlightenment. Hmm. – Sebastian Mach Apr 16 '14 at 09:36
  • @phresnel: it could potentially be of use outside of C or C++, however I know no other language that uses varargs directly. – Matthieu M. Apr 16 '14 at 09:40
  • I feel shizophrenic for discussing this on several layers of myself. But I think you are right; two of three of my own personalities think the C and C++ tags are okay. – Sebastian Mach Apr 16 '14 at 09:42
  • @MatthieuM.: Lua does, via mechanisms that are 100% unrelated to those of C. – Mooing Duck Apr 16 '14 at 16:17

2 Answers2

82

The C and C++ standard do not have any requirement on how it has to work. A complying compiler may well decide to emit chained lists, std::stack<boost::any> or even magical pony dust (as per @Xeo's comment) under the hood.

However, it is usually implemented as follows, even though transformations like inlining or passing arguments in the CPU registers may not leave anything of the discussed code.

Please also note that this answer specifically describes a downwards growing stack in the visuals below; also, this answer is a simplification just to demonstrate the scheme (please see https://en.wikipedia.org/wiki/Stack_frame).

How can a function be called with a non-fixed number of arguments

This is possible because the underlying machine architecture has a so-called "stack" for every thread. The stack is used to pass arguments to functions. For example, when you have:

foobar("%d%d%d", 3,2,1);

Then this compiles to an assembler code like this (exemplary and schematically, actual code might look different); note that the arguments are passed from right to left:

push 1
push 2
push 3
push "%d%d%d"
call foobar

Those push-operations fill up the stack:

              []   // empty stack
-------------------------------
push 1:       [1]  
-------------------------------
push 2:       [1]
              [2]
-------------------------------
push 3:       [1]
              [2]
              [3]  // there is now 1, 2, 3 in the stack
-------------------------------
push "%d%d%d":[1]
              [2]
              [3]
              ["%d%d%d"]
-------------------------------
call foobar   ...  // foobar uses the same stack!

The bottom stack element is called the "Top of Stack", often abbreviated "TOS".

The foobar function would now access the stack, beginning at the TOS, i.e. the format string, which as you remember was pushed last. Imagine stack is your stack pointer , stack[0] is the value at the TOS, stack[1] is one above the TOS, and so forth:

format_string <- stack[0]

... and then parses the format-string. While parsing, it recognozies the %d-tokens, and for each, loads one more value from the stack:

format_string <- stack[0]
offset <- 1
while (parsing):
    token = tokenize_one_more(format_string)
    if (needs_integer (token)):
        value <- stack[offset]
        offset = offset + 1
    ...

This is of course a very incomplete pseudo-code that demonstrates how the function has to rely on the arguments passed to find out how much it has to load and remove from the stack.

Security

This reliance on user-provided arguments is also one of the biggest security issues present (see https://cwe.mitre.org/top25/). Users may easily use a variadic function wrongly, either because they did not read the documentation, or forgot to adjust the format string or argument list, or because they are plain evil, or whatever. See also Format String Attack.

C Implementation

In C and C++, variadic functions are used together with the va_list interface. While the pushing onto the stack is intrinsic to those languages (in K+R C you could even forward-declare a function without stating its arguments, but still call it with any number and kind arguments), reading from such an unknown argument list is interfaced through the va_...-macros and va_list-type, which basically abstracts the low-level stack-frame access.

Reinstate Monica
  • 588
  • 7
  • 21
Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
  • 19
    Note that the standard places no actual requirements on how this works. For what it's worth, it could also use magical pony dust to make it work. (Also, I did not downvote.) – Xeo Apr 16 '14 at 09:17
  • @Xeo: No need to tell you didn't in your case :) I will add a disclaimer and include what you rightly said. – Sebastian Mach Apr 16 '14 at 09:22
  • Just for anyone's interest who might happen to read this: this is what makes format string exploits possible. Never ever use a user input string as a format string in a `printf` call! – Cu3PO42 Apr 16 '14 at 09:58
  • 4
    `stdcall` cannot be used as the calling convention of variadic function. Even if the writer of variadic function knows the number of arguments, maybe compiler cannot know it. And standards allows use multiple `va_list` by calling `va_start` multiply or using `va_copy`, so `va_arg` is not implemented by `pop`, but by reading stack directly (e.g. `mov eax, [valist]`). So compiler cannot figure out how many stack should be poped while compiling variadic function - Only "caller" knows that. So, `cdecl` should be used. – ikh Apr 16 '14 at 10:06
  • It is probably worth noting many compilers 'cheat' where the format string is known in advance and do not actually use variadic semantics. – Vality Apr 16 '14 at 11:14
  • 4
    Of course, if the stack grows upwards, instead of down, everything is reversed. And even as you describe it, it's not quite true. Arguments aren't really popped when accessing them. Typically, `va_list` will define a pointer type, and `va_arg` will update it according to the type of the argument being extracted. (This is why the _type_ argument of `va_arg` must correspond to the promoted type, and not the type you might actually want.) – James Kanze Apr 16 '14 at 11:49
  • 3
    @ikh Both _stdcall_ and _cdecl_ are purely Microsoft conventions. Most other systems only have one basic convention, and pass all arguments to all functions in the same way. The few that don't (other than Microsoft) use the standard defined mechanism for specifying the calling conventions: `extern "C"` (or something else instead of `C`). – James Kanze Apr 16 '14 at 11:52
  • 12
    -1: This merely(and verbosely) describes how a stack works to pass a fixed number of parameters. It manages to miss almost all of the salient points about how a Variadic function call with a variable number of arguments is actually implemented in most architectures: i.e., with a *frame pointer* or an *argument counter* in addition to a stack pointer. with out those, the called function has no idea where the bottom of the call frame is. – RBarryYoung Apr 16 '14 at 15:21
  • @Vality: The implementation of `printf` has to be able to handle arbitrary variadic arguments, so any call to `printf` *has* to use variadic semantics. The cheating you describe can happen, but only by transforming a `printf` call to a call to some other (non-variadic) function. For example, a call like `printf("hello\n")` might be optimized to the equivalent of `puts("hello")`. – Keith Thompson Apr 16 '14 at 15:46
  • 1
    @KeithThompson Yes, that is exactly what I was describing, however I think some compilers go further than that by transforming a printf with a constant format string into a series of conversion functions and puts's But I do understand and agree with all you said, I just thought it an interesting implementation note. – Vality Apr 16 '14 at 15:51
  • Guys/Gals, I've refined my answer quite a bit. Thanks for you help, hope it's better now! – Sebastian Mach Apr 17 '14 at 08:06
  • @ikh: But in `stdcall`, arguments _are_ passed right-to-left? Anyways, I removed that section for sake of clarity. – Sebastian Mach Apr 17 '14 at 08:14
  • @JamesKanze um? stdcall and cdecl are commonly used in most 32-bit system. (although there are a little differences between one system and another..) – ikh Apr 18 '14 at 22:49
  • @phresnel Yes, both cdecl and stdcall. – ikh Apr 18 '14 at 22:51
  • @ikh: I asked because you wrote `stdcall cannot be used as the calling convention of variadic function. Even if the writer of variadic function knows the number of arguments, maybe compiler cannot know it.`, but right-to-left passing is what is required for variadic functions (unless the format string would be the last argument) – Sebastian Mach Apr 19 '14 at 04:57
  • @phresnel Not only right-to-left passing but also `cleaning stack by caller` is required. As I said, it's too difficult or impossible for compiler to figure out the number of variadic argument – ikh Apr 20 '14 at 10:01
  • @phresnel For example, `wsprintf` function of win32 api is `cdecl`, even if other functions of api are `stdcall`. – ikh Apr 20 '14 at 10:02
  • @ikh: Yeah I see. While not impossible (called function could clean up by the information passed in the format string), `stdcall` would mean even bigger security issues. – Sebastian Mach Apr 21 '14 at 06:41
  • @ikh stdcall and cdecl are Microsoftisms. They're not used except when compilers try to be compatible with Microsoft. (And why Microsoft did it this way, when the standard provides a standard way of doing it, it beside me.) – James Kanze Apr 21 '14 at 20:51
  • @JamesKanze Um..? Although they're not standard, they are commonly used in 32-bit x86 system. For example, when writting assembly function to link with C program which is compiled by gcc, we must follow cdecl if calling convention isn't specified. Of course, it may not equal to MS completely. – ikh Apr 24 '14 at 05:51
  • @ikh I've never used them or heard of them under Linux; they didn't exist back when I was developing on that platform. There is normally only one calling convention used on a specific architecture/OS, for a specific language. Windows is a bit special in this regard, in that they impose the conventions of Pascal when calling into the C system library. – James Kanze Apr 24 '14 at 08:25
  • @ikh So there are a lot of different names, with each compiler using a different subset, and defining the ones it uses differently. In sum, any use of such names requires specification of the compiler, and many of the names are only significant to one compiler. – James Kanze Apr 25 '14 at 12:35
  • And I just noticed one misstatement in the actual answer: in C, you can declare a function without any information about its arguments; in such cases, however, all calls to the function _must_ pass arguments compatible with those in the function definition, or it is undefined behavior (and such functions cannot be varargs). And this is specified as an obsolete feature (in other words, deprecated). – James Kanze Apr 25 '14 at 12:42
  • @JamesKanze: Ouh, misconception on my side. I will edit my answer. – Sebastian Mach Apr 28 '14 at 06:33
  • What I still don't get is how the function knows how many arguments to read from. Since you have to hardcode stack offsets, how does it know when it's "done"? Is there a "secret" argument count argument like with `int argc, char** argv[]`? – puppydrum64 Nov 23 '22 at 15:43
  • @puppydrum64: The magic is that there is no magic. For `printf`, the decision about how many values to pop from the stack is solely based on the format string. So if you pass `%d%d%d` as the format string, `printf` ___will___ read 3 integers - _whether there **are** 3 integers or **not**_. And as you guess, that is a problem. – Sebastian Mach Nov 23 '22 at 16:44
  • Oh, that's clever! Never thought of that. – puppydrum64 Nov 23 '22 at 16:47
  • @puppydrum64: ... but remember that clever is not necessarily good :) There's a huge opportunity for the function to be called incorrectly. Here's an outline: https://owasp.org/www-community/attacks/Format_string_attack – Sebastian Mach Nov 24 '22 at 14:44
10

Variadic functions are defined by the standard, with very few explicit restrictions. Here is an example, lifted from cplusplus.com.

/* va_start example */
#include <stdio.h>      /* printf */
#include <stdarg.h>     /* va_list, va_start, va_arg, va_end */

void PrintFloats (int n, ...)
{
  int i;
  double val;
  printf ("Printing floats:");
  va_list vl;
  va_start(vl,n);
  for (i=0;i<n;i++)
  {
    val=va_arg(vl,double);
    printf (" [%.2f]",val);
  }
  va_end(vl);
  printf ("\n");
}

int main ()
{
  PrintFloats (3,3.14159,2.71828,1.41421);
  return 0;
}

The assumptions are roughly as follows.

  1. There must be (at least one) first, fixed, named argument. The ... actually does nothing, except tell the compiler to do the right thing.
  2. The fixed argument(s) provide information about how many variadic arguments there are, by an unspecified mechanism.
  3. From the fixed argument it is possible for the va_start macro to return an object that allows arguments to be retrieved. The type is va_list.
  4. From the va_list object it is possible for va_arg to iterate over each variadic argument, and coerce its value it into a compatible type.
  5. Something weird might have happened in va_start so va_end makes things right again.

In the most usual stack-based situation, the va_list is merely a pointer to the arguments sitting on the stack, and va_arg increments the pointer, casts it and dereferences it to a value. Then va_start initialises that pointer by some simple arithmetic (and inside knowledge) and va_end does nothing. There is no strange assembly language, just some inside knowledge of where things lie on the stack. Read the macros in the standard headers to find out what that is.

Some compilers (MSVC) will require a specific calling sequence, whereby the caller will release the stack rather than the callee.

Functions like printf work exactly like this. The fixed argument is a format string, which allows the number of arguments to be calculated.

Functions like vsprintf pass the va_list object as a normal argument type.

If you need more or lower level detail, please add to the question.

david.pfx
  • 10,520
  • 3
  • 30
  • 63
  • The `...` may be critical in implementations which generally expect called functions to clean up pushed arguments on exit. The C Standard mandates that passing extra arguments to something like "printf" has no effect, but the only way that could work with callee-clean convention would be if the caller knew either that it was responsible for variadic arguments, or that it needed to let the callee know the quantity of arguments the callee it needed to clean up. – supercat Feb 23 '17 at 01:05