3

I understand that the preferred way to implement something like a global/instance/module variable in Rust is to create said variable in main() or other common entry point and then pass it down to whoever needs it. It also seems possible to use a lazy_static for an immutable variable, or it can be combined with a mutex to implement a mutable one.

In my case, I am using Rust to create a .so with bindings to Python and I need to have a large amount of mutable state stored within the Rust library (in response to many different function calls invoked by the Python application).

What is the preferred way to store that state?

Is it only via the mutable lazy_static approach since I have no main() (or more generally, any function which does not terminate between function calls from Python), or is there another way to do it?

Ginty
  • 3,483
  • 20
  • 24
  • 1
    What's the nature of the mutable state, and why does it need to persist across function calls? – user31601 Nov 25 '19 at 11:02
  • The state will be stored by many vectors of structs, each containing many thousands of elements, and probably a lot of hierarchy within the structs. It will be a model of a micro processor. It needs to persist across function calls since Rust is being used for the main application engine but exposed to the user via a Python interface. A single Python process will call Rust functions to manipulate and query the model and the Rust domain needs to keep track of that state. – Ginty Nov 25 '19 at 11:44
  • See also [Objects in *The Rust FFI Omnibus*](http://jakegoulding.com/rust-ffi-omnibus/objects/). – Shepmaster Nov 25 '19 at 19:20
  • 2
    Thanks @Shepmaster, user31601 and Matthieu M. for your time and patience. It took me a while but I see what you were all getting at now and it is a good solution you have provided. I may write up my own answer to this question in more layman's terms once I get a working solution based on the FFI example. Thanks again! – Ginty Nov 25 '19 at 23:19
  • @user31601, please see above – Ginty Nov 25 '19 at 23:19
  • @Matthieu M., please see above – Ginty Nov 25 '19 at 23:20

2 Answers2

6

Bundle it

In general, and absent other requirements, the answer is to bundle your state in some object and hand it over to the client. A popular name is Context.

Then, the client should have to pass the object around in each function call that requires it:

  • Either by defining the functionality as methods on the object.
  • Or by requiring the object as parameter of the functions/methods.

This gives full control to the client.

The client may end up creating a global for it, or may actually appreciate the flexibility of being able to juggle multiple instances.

Note: There is no need to provide any access to the inner state of the object; all the client needs is a handle (ref-counted, in Python) to control the lifetime and decide when to use which handle. In C, this would be a void*.


Exceptions

There are cases, such as a cache, where the functionality is not impacted, only the performance.

In this case, while the flexibility could be appreciated, it may be more of a burden than anything. A global, or thread-local, would then make sense.

Community
  • 1
  • 1
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • Thanks for the answer, I can see that working for something simple like an app configuration and I fact I already do something along these lines for that. However, in this case we are talking about a large amount of data, e.g. many MB, which I am doing in Rust specifically for performance and I don't want to pass it all over to the Python space. I want it to exist in Rust only and have it exposed to Python via an API to read/write discrete portions of it. Maybe the comment I just added to the question provides better context. – Ginty Nov 25 '19 at 11:54
  • 4
    @Ginty: You don't have to expose the *internals* to Python. From the Python point of view, it can just be some opaque handle to a blob of state (`void*` in C parlance), which the user passes to the functions calls. – Matthieu M. Nov 25 '19 at 13:17
  • Ok, thanks for the clarification, but I don't think it's really answering my question. This seems like the answer to something like "how to expose data strored in Rust to a client/Python", whereas my question is about how to store the data on the Rust side. – Ginty Nov 25 '19 at 16:31
  • *I don't want to pass [multiple megabytes of data] all over to the Python space* — @Ginty there's no reason to. The Python C code can own the struct (or put the struct in a `Box` and it owns that raw pointer), and then passes back the reference to that data to the Rust code. Passing 8 bytes of pointer is not a hardship. – Shepmaster Nov 25 '19 at 17:09
  • 1
    Thanks @Shepmaster, but I think this is going off on a tangent away from my original question. Actually, I don't want the Python side to be the owner of the data at all, or have any direct low level access to it. The data will will be interfaced to/from the Python side via a much higher-level API. Really, the crux of my question is simply: If I don't have a main() (as I would with a Rust binary situation), what is the recommended way to store a large amount of long-lived mutable data in Rust? – Ginty Nov 25 '19 at 17:17
  • @Ginty speaking bluntly, your desired outcome is not a good one. You are effectively asking how to create a global variable (see [How do I create a global, mutable singleton?](https://stackoverflow.com/a/27826181/155423)). The best answer is **don't**, which is what this answer generally recommends and what the linked answer also recommends. We give this advice with decades of experience behind it in the hopes that you don't have to spend the same decades learning it the way we did. – Shepmaster Nov 25 '19 at 17:22
  • Thanks for the link @Shepmaster, I think I'd seen that before. In fact, I think why I asked this question originally is that you managed to convinced that asker to ditch their plan for a mutable global and instead pass the data around. So my question here was, if I don't have a main() to provide the lifetime for this data to be passed around, how else should I do it? I don't see much wrong with your answer of how to create a mutable static variable. – Ginty Nov 25 '19 at 17:36
  • @Shepmaster continued... In my case the global variable here is very analogous to a database, and in fact maybe in future I'll replace it with a callout to sqlite or something. Initially though I just want to build and keep it in memory at runtime and the static mutable seems like the tool for the job. Maybe the only one, because I don't think any of the (very helpful) people here have actually offered any workable alternative. – Ginty Nov 25 '19 at 17:38
  • *if I don't have a `main()` to provide the lifetime for this data to be passed around, how else should I do it* — You always have a main function or equivalent. As this answer states, require the consumer (Python code) to own the data and pass a reference to that data back to the Rust code to manipulate it. Then the variable is owned by the **Python** main function. *analogous to a database* — Yes, and hidden globals like database connections are a pain when you outgrow them. – Shepmaster Nov 25 '19 at 17:42
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/203047/discussion-between-ginty-and-shepmaster). – Ginty Nov 25 '19 at 17:46
0

I'd be tempted to dip into unsafe code here. You cannot use non-static lifetimes, as the lifetime of your state would be determined by the Python code, which Rust can't see. On the other hand, 'static state has other problems:

  • It necessarily persists until the end of the program, which means there's no way of recovering memory you're no longer using.
  • 'static variables are essentially singletons, making it very difficult to write an application that makes multiple independent usages of your library.

I would go with a solution similar to what @Matthieu M. suggests, but instead of passing the entire data structure back and forth over the interface, allocate it on the heap, unsafely, and then pass some sort of handle (i.e. pointer) back and forth.

You would probably want to write a cleanup function, and document your library to compel users to call the cleanup function when they're done using a particular handle. Effectively, you're explicitly delegating the management of the lifecycle of the data to the calling code.

With this model, if desired, an application could create, use, and cleanup multiple datasets (each represented by their own handle) concurrently and independently. If an application "forgets" to cleanup a handle when finished, you have a memory leak, but that's no worse than storing the data in a 'static variable.

There may be helper crates and libraries to assist with doing this sort of thing. I'm not familiar enough with rust to know.

user31601
  • 2,482
  • 1
  • 12
  • 22
  • Thanks, that's helpful. Looking at variations on this question in other places the answers always include variants of "global vars/state are evil, you should pass the data between the functions that need access". However, I think you've confirmed that in my case with a lib a static/global var is exactly how this should be done, and maybe the only way. I think I can live with the limitations of static that you note. – Ginty Nov 25 '19 at 16:37
  • That wasn't what I was intending to suggest! A heap allocated, manually managed blob is not the same as a static variable. I'm advocating the former. – user31601 Nov 25 '19 at 16:46
  • `unsafe` code does not mean "bad" code. The whole reason `unsafe` exists as a keyword in Rust is because there are situations that the Rust borrow checker can't handle. This is one of them, so use `unsafe`. – user31601 Nov 25 '19 at 16:48
  • What's the difference? A static vector would go on the heap, right? – Ginty Nov 25 '19 at 16:48
  • Well, the contents of _any_ vector would go on the heap, technically. The distinction I'm trying to draw is between data managed behind a static (i.e. global) variable, and a dynamically allocated region of heap data that isn't tied to _any_ long-lived rust variable. If you know C++, its the difference between `static BigThingy x;` and `BigThingy* x = new BigThingy();`. – user31601 Nov 25 '19 at 16:57