How does Rust retrieve the input argc and argv values from a running program?

Question

I know that the Rust application initialization entry is dynamically generated by rustc. And I inspected the code at compiler/rustc_codegen_ssa/src/base.rs which the part of it is shown as below.

fn create_entry_fn<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
    cx: &'a Bx::CodegenCx,
    rust_main: Bx::Value,
    rust_main_def_id: DefId,
    use_start_lang_item: bool,
) -> Bx::Function {
    // The entry function is either `int main(void)` or `int main(int argc, char **argv)`,
    // depending on whether the target needs `argc` and `argv` to be passed in.
    let llfty = if cx.sess().target.main_needs_argc_argv {
        cx.type_func(&[cx.type_int(), cx.type_ptr_to(cx.type_i8p())], cx.type_int())
    } else {
        cx.type_func(&[], cx.type_int())
    };

And what I found in the same file was really interesting as what I showed below, here from the comment, we can understand that Rust is collecting the input argc and argv at this place, and all these two parameters will be passed into the lang_start function later if I understand correctly.

/// Obtain the `argc` and `argv` values to pass to the rust start function.
fn get_argc_argv<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
    cx: &'a Bx::CodegenCx,
    bx: &mut Bx,
) -> (Bx::Value, Bx::Value) {
    if cx.sess().target.main_needs_argc_argv {
        // Params from native `main()` used as args for rust start function
        let param_argc = bx.get_param(0);
        let param_argv = bx.get_param(1);
        let arg_argc = bx.intcast(param_argc, cx.type_isize(), true);
        let arg_argv = param_argv;
        (arg_argc, arg_argv)
    } else {
        // The Rust start function doesn't need `argc` and `argv`, so just pass zeros.
        let arg_argc = bx.const_int(cx.type_int(), 0);
        let arg_argv = bx.const_null(cx.type_ptr_to(cx.type_i8p()));
        (arg_argc, arg_argv)
    }
}

But I also found another place where seems to do the same thing as what I've showed above at library/std/src/sys/unix/args.rs. For example, if you run a Rust app on Macos, seems Rust will use two FFI functions (_NSGetArgc / _NSGetArgv) to retrieve the argc and argv:

#[cfg(any(target_os = "macos", target_os = "ios"))]
mod imp {
    use super::Args;
    use crate::ffi::CStr;

    pub unsafe fn init(_argc: isize, _argv: *const *const u8) {}

    pub fn cleanup() {}

    #[cfg(target_os = "macos")]
    pub fn args() -> Args {
        use crate::os::unix::prelude::*;
        extern "C" {
            // These functions are in crt_externs.h.
            fn _NSGetArgc() -> *mut libc::c_int;
            fn _NSGetArgv() -> *mut *mut *mut libc::c_char;
        }

        let vec = unsafe {
            let (argc, argv) =
                (*_NSGetArgc() as isize, *_NSGetArgv() as *const *const libc::c_char);
            (0..argc as isize)
                .map(|i| {
                    let bytes = CStr::from_ptr(*argv.offset(i)).to_bytes().to_vec();
                    OsStringExt::from_vec(bytes)
                })
                .collect::<Vec<_>>()
        };
        Args { iter: vec.into_iter() }
    }

So, what's the difference between these two places? Which place actually does the real retrieval stuff?

The `args` functions in the standard library are the means by which [`std::env::args()`](https://doc.rust-lang.org/std/env/fn.args.html) and [`std::env::args_os()`](https://doc.rust-lang.org/std/env/fn.args_os.html) are implemented. The `create_entry_fn ` and `get_argc_argv` functions in the compiler are (part of) generating the instructions to route startup arguments to the language runtime/entrypoint. — eggyal, May 08 '21 at 13:31

KokaKiwi · Accepted Answer · 2021-05-08T22:58:14.157

To reply directly to the question "Which place actually does the real retrieval stuff?", well, it depends on:

The target OS: Linux, MacOS, Windows, WebAssembly
The target "environment" (e.g. libc): glibc, musl, wasi, even miri in Rust's case

They basically are either passed as arguments to the program entry-point or provided "globally" by using functions/syscalls:

In the first case (passed as arguments), the Rust compiler generate code for initializing two static values ARGC and ARGV (located at std/src/sys/unix/args.rs#L87), which are then used by std::env::args() for the developer to use.

Note that, depending on the libc used, this phase is done either at _start and/or by some ld+libc-specific routine (it gets messy when taking dynamic linking into account) In the case of glibc it's done by the GNU non-standard "init_array" extension (which is notably used for "cdylib" crates/.so executables): std/src/sys/unix/args.rs#L108-L128

Also in case you directly specify the entry-point using the #[start] attribute you get direct access to the argc/argv values (compiler/rustc_codegen_ssa/src/base.rs#L447)
In the second case, no initialization code is needed and the args-getter functions are called by std::env::args() when needed, as you already noticed on MacOS

Such as MacOS (and Windows apparently) uses both methods, providing argc/argv both as arguments to _start and as getter functions callable from anywhere, which Rust uses.

Linux actually uses the first case only, although it wouldn't be surprising if the glibc provided some functions to get these values (by some wibbly wobbly magic methods), but the standard way is the first one.

For further reading, you can look at some links and articles about the "program loader" on Linux (sadly, there's not much on the subject in general, especially for other OSes):

LWN article "How programs get run: ELF binaries": https://lwn.net/Articles/631631/ (especially the "Populating the stack" part)
"The start attribute" section in one article of the "Rust OS dev" series: https://os.phil-opp.com/freestanding-rust-binary/#the-start-attribute
Reply to a (too broad, closed) Stack Overflow question about program loading and running: https://stackoverflow.com/a/32689330/1498917

I checked the file at **compiler/rustc_target/src/spec/aarch64_apple_darwin.rs**, which describes the configuration of the compiling target of macOS; I found there is no override to the field "main_needs_argc_argv," so the default value should be true. So, on macOS, it will use the first way to retrieve the argc and argv rather than the second way which through the global functions. Is this contradictory that we have two ways here for macOS? Or anything I understand wrong? — Jason Yu, May 08 '21 at 15:57
After looking at the source code and generating test stuff for the macos target (i don't have a mac to actually test), it appears it indeed generate entrypoint code to handle argc/argv arguments, making my answer a bit more incomplete/complex than it is already. So i would suppose on MacOS it actually pass arguments the same way as Linux, but also provide functions to retrieve these values, which are actually used by Rust's libstd. That said, i also suppose it's used for "no-std" or "#[start]"-using programs for which it wouldn't make any effect, allowing them to get these values. — KokaKiwi, May 08 '21 at 19:35

How does Rust retrieve the input argc and argv values from a running program?

1 Answers1