4

It stands to reason that, for executable code to be called a function, it should conform to the function calling convention of the platform it's running on.

However, _start() does not; for example in this reference implementation there is no return address on the stack:

.section .text

.global _start
_start:
    # Set up end of the stack frame linked list.
    movq $0, %rbp
    pushq %rbp # rip=0
    pushq %rbp # rbp=0
    movq %rsp, %rbp

    # We need those in a moment when we call main.
    pushq %rsi
    pushq %rdi

    # Prepare signals, memory allocation, stdio and such.
    call initialize_standard_library

    # Run the global constructors.
    call _init

    # Restore argc and argv.
    popq %rdi
    popq %rsi

    # Run main
    call main

    # Terminate the process with the exit code.
    movl %eax, %edi
    call exit
.size _start, . - _start

Yet it's called a function in a myriad of sources. A number of questions and answers on StackOverflow also refer to it as a function.

Is a function simply a group of instructions identified by the address to the entry point, or must it conform to the calling convention? The C standard does not seem to define the concept of a function, neither do the gcc and clang docs. What is the authoritative source that defines this concept?

mnistic
  • 10,866
  • 2
  • 19
  • 33
  • 2
    This is not a part of `C` language, because it happens before `main`, so the tags are quite irrelevant. For the same reasoning, asking whether it is a "function" in C terminology is meaningless. `start` conventions are highly dependent on the hosting environment. – Eugene Sh. Feb 23 '22 at 18:56
  • 1
    @EugeneSh. what are the appropriate tags? – mnistic Feb 23 '22 at 18:58
  • 2
    Well, perhaps they are relevant in a sense that help to point out that they are not :) – Eugene Sh. Feb 23 '22 at 18:59
  • C defines the *behavior* of functions expressed in the C language. It says nothing at all about the details of the machine code to which a C implementation translates C sources. It says nothing about the meaning of the word "function" outside the context of C source code. – John Bollinger Feb 23 '22 at 19:00
  • When you say "*`_start()` does not*" you seem to be making a questionable generalization from one example. Moreover, it's not clear that the claim is valid even for the example, as it it is the caller that is responsible for putting the function's return address on the stack, not the function. If the caller here is taking liberties with its call to `start()`, that does not speak to whether `start()` should be considered a function. – John Bollinger Feb 23 '22 at 19:07
  • Pretty much by definition it must follow "a" calling convention, but not necessarily "the" calling convention -- many different calling conventions are possible, and the only requirement is that the caller and callee agree on what calling convention to use for that function. In this case the only caller is the kernel, and it does not expect the function to return. – Chris Dodd Feb 23 '22 at 19:07
  • @JohnBollinger Understood, I'm just looking for an authoritative reference that defines the thing, if there is one – mnistic Feb 23 '22 at 19:11
  • 1
    Maybe [entry point](https://en.wikipedia.org/wiki/Entry_point) is a better name? – pmg Feb 23 '22 at 19:37
  • Tags changed your definition of a question: It can't be a C function because you can't write down its declaration in C. But that clearly isn't what you intended to ask. – Joshua Feb 24 '22 at 19:51
  • 2
    What are the appropriate tags? (feel free to edit to add the tags) – mnistic Feb 24 '22 at 20:33

4 Answers4

1

About the lack of a return making a piece of code not a function, even a function written in C, does not have to have a return instruction in it:

int call_fn(int(*fn)()) {
    return fn();
}

This function, with proper optimizations compiles down to a single jmp instruction: https://godbolt.org/z/nxT9qTvaf

call_fn(int (*)()):                        # @call_fn(int (*)())
        jmp     rdi                             # TAILCALL

In general, I don't think the C or the C++ standard would define anything about stuff written in assembly. A common calling convention helps for making direct calls into functions written in other languages, but you can still call functions using other calling conventions using a trampoline.

Fatih BAKIR
  • 4,569
  • 1
  • 21
  • 27
  • The function doesn't have a ret instruction because it tail calls another function, but it does have a slot on the stack used by the return address. – Joshua Feb 24 '22 at 19:44
1

It stands to reason that, for executable code to be called a function, it should conform to the function calling convention of the platform it's running on.

"Function" is the primary idea here; "calling convention" is subsidiary to that. As such, I think a more supportable claim would be that for every function, there is a convention for calling it.

Interoperability considerations lead to standardization of calling conventions, but there is no One True calling convention, not even on a per-platform basis. Even subject to the influence of interoperability, there are platforms that support multiple standard calling conventions. In any case the existence of standard calling conventions does not necessarily relegate code with other conventions for entry and exit to non-function-hood.

Is a function simply a group of instructions identified by the address to the entry point, or must it conform to the calling convention?

This is a question of the definition of "function". There is room for variation on this, and in practice, different definitions apply in different contexts. For example, the question refers to the C language specification, but this speaks to the meaning of "function" in the context of C source code, not assembly or machine code.

In practice, in various languages and contexts, there are

  • functions with identifiers and functions without;
  • functions that return a value and functions that don't;
  • functions with a single entry point and functions with multiple entry points;
  • functions with a single exit point and functions with multiple exit points;
  • functions that always return to the caller, functions that usually return, functions that occasionally return, and functions that never return;
  • a wide variety of patterns for how functions receive data to operate on, how they return data to their caller (if they do so), and what invariants they do and do not ensure
  • other dimensions of variation, too

Thus, no, I do not accept in any universal sense that a piece of code needs to conform to a particular calling convention to be called a "function", and I also do not accept "a group of instructions identified by the address to the entry point" as a satisfactory universal definition.

Is _start() a function?

A _start() function such as is provided by GCC / Glibc satisfies some relevant definitions of the term. I have no problem with calling it a "function".

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
1

There seems to be this idea going around in the newer programming models that all running code is inside functions; but in the beginning this was not so, and if we look at the old languages we can observe this.

Drawing from lisp:

(format t "Hello, World!")

This is hello world in common lisp, and is not a function in any normal sense. For comparison, here is it as a function:

(defun hello ()
           (format t "Hello, World!"))
(hello)

And from near the other root of all programming languages; here is Fortran (source):

      PROGRAM FUNDEM
C     Declarations for main program
      REAL A,B,C
      REAL AV, AVSQ1, AVSQ2
      REAL AVRAGE
C     Enter the data
      DATA A,B,C/5.0,2.0,3.0/

C     Calculate the average of the numbers
      AV = AVRAGE(A,B,C)
      AVSQ1 = AVRAGE(A,B,C) **2
      AVSQ2 = AVRAGE(A**2,B**2,C**2)

      PRINT *,'Statistical Analysis'
      PRINT *,'The average of the numbers is:',AV
      PRINT *,'The average squared of the numbers: ',AVSQl
      PRINT *,'The average of the squares is: ', AVSQ2
      END

     REAL FUNCTION AVRAGE(X,Y,Z)
     REAL X,Y,Z,SUM
     SUM = X + Y + Z
     AVRAGE = SUM /3.0
     RETURN
     END

Yup that's top level statements and a function definition. Fortran has three things, the PROGRAM, SUBROUTINEs, and FUNCTIONs.

And again, we can do the same kind of example in QuickBasic:


CALL Hello

Sub Hello()
Print "Hello, World"
End Sub

QuickBasic was kind of funny; you never even tried to name the entry point and whatever .OBJ file was first in the build script was where the entry point was.

There's a general recurring theme here. In all of these, the top level isn't very function-like. The compiler would add stuff to the beginning of the entry point for you so that runtime initialization worked correctly.

Now what happened in C? C took a different path. The initialization routines were written in their own file that calls main() and the compiler just compiles main() as it would any other function and has no capacity for emitting code that runs at top level. Thus, the entry point (traditionally called _start but doesn't have to be) is not and cannot be written in C.

Don't get me wrong here, if you were to compile any of these on a Unix platform today and look at the resulting .o files you would see the modern compilers emit a main() function with the top level code in it. This is because of the preeminence of the C runtime and not because of any need for it to be a function. Had the other languages carried around in their runtimes the definitions of the system calls like they used to, this would not need to be.

Thus we have the process entry point is not a function.

We can take this argument one step farther; suppose (and I have seen news articles reference a thing kind of like this) we had a full native Java compiler that emitted .o files and linked against .so files providing the Java runtime; we could then ask Is _start a class method? The answer isn't no. The answer is the question makes no sense because you can't get a valid Java reference to the symbol. The same silly thing happens in C, we just need to pick a different platform. On DOS FAR model, _start is exported as PROC NEAR but void _start() expects a PROC FAR. The emitted link-time fixup is of the wrong size and trying to take the address of _start results in undefined behavior.

Joshua
  • 40,822
  • 8
  • 72
  • 132
1

You are mixing fields. You can't apply "text specification" to oranges.

the C standard does not seem to define the concept of a function

C is a language. In the C language, the text like the following:

void func();

is a function declaration of a function func.

Is _start() a function?

The text you posted is not in C language. There are no functions declarations and definitions in it.

As you stated, the term function is not defined in the C standard. I would assume that the English language understanding of the term "function" applies here, as to any other word in the C standard.

I see in Merriam-Webster that a "function" is a computer subroutine, where a subroutine is a a sequence of computer instructions for performing a specified task that can be used repeatedly.

Clearly, _start is a function - it is a sequence of instructions to be executed repeatedly, it is executed on a computer, and it also operates on variables in the form of registers.

The text you posted represents the function _start in the form of a text using assembly language. It is not possible to represent the function _start in the C programming language.

(It is also not possible to express oranges, yet they exist in the real world. My point is, you can take any other word in the C standard, like, I don't know, "international", and ask "Are oranges international?". Applying C standard and "language-lawyer" tag to abstract contexts is not going to give you answers. Bottom line is that the C standard is a specification - it tells what happens when, it is not a dictionary.)

Is a function simply a group of instructions identified by the address to the entry point, or must it conform to the calling convention?

See Merriam-Webster function.

What is the authoritative source that defines this concept?

I googled and "There is no official agency that makes rules for English language".

The C standard is created by http://www.open-std.org/jtc1/sc22/wg14/ .

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • You are making a lot of sense, but the C standard does define a bunch of terms that also have common-sense definitions in the English language, like "object", "value", "statement", "behavior" and so on. I found it surprising that "function" was omitted. – mnistic Feb 24 '22 at 21:55
  • As to the C tag complaint, I understand that is not the most appropriate, but I didn't know what else to put. I didn't want to use x86, because I really didn't want the question to be architecture-specific. – mnistic Feb 24 '22 at 21:56