In Rust, Is it expected that a program’s state variables are mostly instantiated inside of the main function?

Question

How do people handle the variables of a program? Is the expectation that one just declares most all the variables necessary throughout an entire program in the main function and then passes those variables as parameters to the module’s functions as needed?

And, no, I’m not asking to create global variables, rather, local variables scoped to a module and only privately accessible from within. And, I’m also not asking for shared state across threads using mutexes, etc, rather just simple compartmentalization of variables (or state if you prefer that term) scoped to a specific module.

For just a mid-size program that can be a large number of variables being declared within the main function. And, frustratingly, most will be relevant to only specific modules. Why does rust not allow a module to contain declared variables within its scope? Particular modules are going to need particular variables that live longer than their function calls (local state), but are only relevant to that module. This seems very odd that all variables, even ones specific to a module, are declared in the main function. Very odd and breaks the compartmentalization of code.

Related: [Is it possible to use global variables in Rust?](https://stackoverflow.com/q/19605132/364696) and [How do I create a global, mutable singleton?](https://stackoverflow.com/q/27791532/364696). Some of your statements about things not being allowed are wrong. — ShadowRanger, Mar 02 '23 at 05:20
That sounds to me like your initial thinking about design is flawed. If you need _something_ that maintains state, there's `struct`. — cadolphs, Mar 02 '23 at 05:23
I have the impression that you are trying to apply a pattern that comes from an other language into Rust, because I don't see how else you could think of having to declare variables of every module in `main` (as a matter of fact, Rust encourages you to write your code as a library which does not rely on `main`, and put in `main` only the driver code). — jthulhu, Mar 02 '23 at 08:13
@jthulhu, so are you saying that Rust encourages splitting a program into multiple libraries and then driving those libraries from the main application? Yes, I am probably trying to ram a design paradigm into Rust that isn’t coherent to the language. Case in point, Typescript’s module system allows for local scoped variables which are private to that module. It’s quite nice for compartmentalization, makes it easier to reason about the code. I’m not advocating for Typescript, just looking to understand better how to do things the Rust way having come from more of a Typescript background — risingtiger, Mar 02 '23 at 13:35
@ShadowRanger, no, I’m not looking for global variables. Quite the opposite: local variables scoped within a module, private to that module and inaccessible from without. Typescript provides such a paradigm and I use it extensively. Now, I’m not advocating Typescript, it sucks in many ways, thus why I’m looking to Rust. I’m just looking for the Rust way of doing things and am willing to change my design approach. I just don’t see how it’s not relevant to have local, private variables scoped within a module. — risingtiger, Mar 02 '23 at 13:39
@risingtiger Rust does support visibility modifiers. Everything is private in Rust, unless `pub` keyword is used explicitly. Read this: https://doc.rust-lang.org/reference/visibility-and-privacy.html — freakish, Mar 02 '23 at 14:11
@risingtiger: If those variables are immutable, that's perfectly legal (see linked questions). If they're mutable, then they're still "the bad kind of globals" ("global" here means a singleton shared variable which can be accessed from multiple places without any explicit passing of ownership or references; it can still be "global" to just a single module), and they're heavily restricted to prevent the sort of races shared mutable state is inherently prone to; if your code's signal handling or threading has multiple accessors to the state, it's race conditions all over. — ShadowRanger, Mar 02 '23 at 14:40
TypeScript doesn't have to deal with this because JS is inherently single-threaded (and for the variants with multithreading, it would in fact have race conditions mutating said globals, so again, Rust is playing it safe where most other languages dump race-handling on you). — ShadowRanger, Mar 02 '23 at 14:44

ShadowRanger · Answer 1 · 2023-03-02T18:49:52.493

First off, to be clear, "local variables scoped within a module" are still "globals". They're not visible to everyone, but there is a single global location storing that variable, and any/all threads of execution that do have visibility on that variable are sharing access to the same global copy.

This creates a huge problem with race conditions in most languages. If Rust allowed it, and you did:

static mut someint: i64 = 0;

pub fn public_api() {
    someint += 1;
}

then two threads called public_api at the same time, you'd risk dropped increments, and on some architectures, torn data (e.g. if 64 bit values are actually manipulated as a pair of 32 bit values, then an increment that caused carry over to the high "word" would be changing each independently, and another thread might read a half-way state that's a mix of pre- and post- increment components, completely corrupting the value; for a value that's initially 0x00000000FFFFFFFF, a pair of increments from different threads that should produce 0x0000000100000001, could not only produce 0x0000000100000000 [an expected result from a dropped increment] but possibly produce 0x0000000000000000, 0x0000000000000001, causing two small additions to result in a massive subtraction; other weirdo results might be possible on unusual compilers/architectures).

Most languages solve this by not solving it; they give you locks and atomics and say "hey, if you don't use them, that's on you". TypeScript (run on a typical, single-threaded JS interpreter) solves it by not offering true threading (making races impossible; control is only handed off between tasks cooperatively, so it can't be interrupted halfway through an increment or the like by preemption).

Rust's solution is to make it illegal to do anything that can't be verified to be safe; any and all shared mutable data must either be properly protected, or manipulated solely in unsafe contexts (where you pinky-swear you won't do anything actually unsafe, and if you're wrong, all of Rust's nice guarantees go out the window).

All that said, you can use globals (with visibility scoped to specific locations). Globals are allowed when at least one of the following things are true:

They're immutable
They're guaranteed atomic or protected by locks that prevent racy access
They're thread local (so they're global per thread, rather than per process) so each thread is operating on them independently

They may be bad style, and the language makes it more difficult to create complex globals (read: you're not allowed to perform heap-allocation in the initializer for a global because only compile-time things may be used as initializer; you typically use libraries & macros that ensure said things are initialized exactly once at run-time) and doesn't care if that inconveniences you, because there should be a disincentive to use them, but it's not going to prevent you from doing so.

Rather than regurgitate how you do all this, I'll point you to this answer showing various ways to initialize heap-allocated globals, or atomic/mutex protected mutable types, this answer on using thread locals, or, if you absolutely must, this answer on just being unsafe (shudders). All of them are suitable for your use case your "local variables scoped within a module" are just globals with appropriate visibility modifiers.

now thats an answer! I've been smashng up against Google search for weeks now, getting shut down by being directed to all these 'how to use globals' answers, which i promptly kept hitting back on because of the term 'global'. Thank you for clarifying so much with this answer. In all honesty, i kept thinking of globals in the classic C way, that they are litterally easily accessible from anywhere. So, its a little confusing using that term to define something that is still somewhat scoped. — risingtiger, Mar 02 '23 at 15:35
I'm still curious how people maintain state for a complex rust application, where compartmentalization is a strong necessity. I'd prefer to not use classes (or structs and impl) to maintain local state. OOP was great for my college class assignments but tends to blow apart in the real world, just my opinion. — risingtiger, Mar 02 '23 at 15:47
@risingtiger: C offers a limited form of "scoped globals" as well. That's what any C `static` is. Not just the ones at the top-level of the source file (where they are visible to all functions within that file, and none outside), but even the `static`s declared within functions (because while they're only visible within the scope of that function, they're still unique singletons shared between any and all invokers of said function, including threaded invocation and signal handler invocation). Not as fine-grained as Rust, but still "limited visibility globals", with all the same racing issues. — ShadowRanger, Mar 02 '23 at 18:33
@risingtiger: I agree with you that OOP (with inheritance, polymorphism, careful adherence to Liskov substitution principle, etc.) has code complexity issues, but you're throwing the baby out with the bathwater here. 1) If you truly only need a single shared mutable state, then sure, use one of the techniques above to make a global for each bit of state you care about, but 2) It's *very* rare you only need a single shared mutable state. Oftentimes, you have state associated with a particular flow of control, and in a simple program, there is only one such flow of control, so you think... — ShadowRanger, Mar 02 '23 at 18:36
..."Let's store the state in globals!" But then when someone wants to use your library in a more complex way (*two* threads, one interacting with database A, one with database B), the fact that your library only has a single global (or even a per-thread thread-local, in the case of event loops doing cooperative multitasking on a single thread) for the DB handle means oops, your library doesn't scale. Simple classes, no inheritance, no polymorphism, just doing the same job as C structs (optionally with non-virtual methods replacing top-level functions) means you can have an arbitrarily... — ShadowRanger, Mar 02 '23 at 18:40
...number of separate "global" scopes. Each instance of the class represents a unique "global" scope for that flow of control, there's no risk of one flow of control inadvertently interfering with another, and you're not even adding much complexity (at the cost of writing a constructor and moving your top-level functions to `impl` methods, the caller just maintains a single variable, calls methods on it, and those methods implicitly receive the self so they know where to look for state, and are *unable* to inadvertently screw up state in some other control flow). — ShadowRanger, Mar 02 '23 at 18:43

In Rust, Is it expected that a program’s state variables are mostly instantiated inside of the main function?

1 Answers1