Traditional calling conventions will almost always allocate parameter space on the stack, and there is always overhead associated with copying arguments into this space.
Assuming a strictly volatile environment, the only additional overhead that can potentially exist can arise from memory alignment issues. In your given example, the parameters will be in contiguous memory and so there won't be any padding to align properly.
In the case of parameters with types of varying sizes, the parameters in the following declaration:
int func (int a, char c, int b)
will have padding between them, whereas those in this declaration:
int func (int a, int b, char c)
will not.
The stack frame for the former might look like:
| local vars... | low memory
+---------------+ - frame pointer
| a | a | a | a |
| c | X | X | X |
| b | b | b | b |
+---------------+ high memory
And for the latter:
| local vars... | low memory
+---------------+ - frame pointer
| a | a | a | a |
| b | b | b | b |
| c | X | X | X |
+---------------+ high memory
When the function gets called, the arguments will be written into the stack memory in the order they appear, so for the former you'll write the 4 bytes of int a
, the 1 byte of char c
, then you need to skip those 3 bytes to write the 4 bytes of int b
.
In the latter, you'll be writing into contiguous memory locations, and won't need to account for skips due to padding.
In a volatile environment, we're talking about a difference in performance on the order of several nanoseconds for the skips. The performance hit may be detectable but almost negligible.
(By the way, how skipping is done is entirely architecture-dependent...but I'd bet in general it is just a higher offset for the next address to fill. I'm not completely sure how this might be done differently in different architectures).
Of course, in a non-volatile environment, when we utilize CPU caching, the performance hit goes down to fractions of a nanosecond. We'd be venturing into the undetectable, and so the difference is effectively nonexistent.
Data padding is really only a space cost. When you're working in embedded systems, you'll want to order your parameters from largest to smallest to reduce (and sometimes eliminate) padding.
So, as far as I can tell (without further information like the exact data transfer rates between memory on a particular machine or architecture), there shouldn't be a performance hit for different parameter orders.