Data to be determined later: interior mutability or separate HashMap?

Question

I have a struct, call it Book, which let's say stores data on a book sold by a bookstore. It needs to be referenced at many places in some data structure (e.g. with Rc) and so cannot be borrowed mutably in the normal way. However, it has some attribute, say its price, that needs to be filled in at some time later than initialization, after the object already has outstanding references.

So far I can think of two ways to do this, but they both have disadvantages:

Interior mutability: give Book a field such as price: RefCell<Option<i32>> which is initialized to RefCell::new(Option::None) when Book is initialized. Later on, when we determine the price of the book, we can use borrow_mut to set price to Some(10) instead, and from then on we can borrow it to retrieve its value.

My sense is that in general, one wants to avoid interior mutability unless necessary, and it doesn't seem here like it ought to be all that necessary. This technique is also a little awkward because of the Option, which we need because the price won't have a value until later (and setting it to 0 or -1 in the meantime seems un-Rustlike), but which requires lots of matches or unwraps in places where we may be logically certain that the price will have already been filled in.
Separate table: don't store the price inside Book at all, but make a separate data structure to store it, e.g. price_table: HashMap<Rc<Book>, i32>. Have a function which creates and populates this table when prices are determined, and then pass it around by reference (mutably or not) to every function that needs to know or change the prices of books.

Coming from a C background as I do, the HashMap feels like unnecessary overhead both in speed and memory, for data that already has a natural place to live (inside Book) and "should" be accessible via a simple pointer chase. This solution also means I have to clutter up lots of functions with an additional argument that's a reference to price_table.

Is one of these two methods generally more idiomatic in Rust, or are there other approaches that avoid the dilemma? I did see Once, but I don't think it's what I want, because I'd still have to know at initialization time how to fill in price, and I don't know that.

Of course, in other applications, we may need some other type than i32 to represent our desired attribute, so I'd like to be able to handle the general case.

Approaching a problem like yours has to start with requirements. What kinds of operations does your bookstore need to support? Every approach is going to have *some* disadvantages; it's up to you to decide which ones are important. — trent, Sep 19 '20 at 16:41
@trentcl: Of course it's a toy example, but let's say the bookstore needs to be able to collect up a bunch of books whose prices are not yet determined, then later assign prices to the books, then later still access those prices to decide how much to charge a customer. — Nate Eldredge, Sep 19 '20 at 16:46
@trentcl: "Every approach is going to have some disadvantages" Yes, of course. My first question is to what extent either approach has the disadvantage of being non-idiomatic. As a beginner in the language, I don't yet have a good sense for that, which is why I'm asking experts. My second question is whether there are other common options I don't know about, whose disadvantages may be less important to me. — Nate Eldredge, Sep 19 '20 at 16:49
A nitpick: since `Option` is `Copy`, you can use the more efficient `Cell — user4815162342, Sep 19 '20 at 19:51

Niklas Mohrin · Answer 1 · 2020-09-19T20:56:58.240

I think that your first approach is optimal for this situation. Since you have outstanding references to some data that you want to write to, you have to check the borrowing rules at runtime, so RefCell is the way to go. Inside the RefCell, prefer an Option or a custom enum with variants like Price::NotSet and Price::Set(i32). If you are really sure, that all prices are initialized at some point, you could write a method price() that calls unwrap for you or does an assertion with better debug output in the case your RefCell contains a None.

I guess that the HashMap approach would be fine for this case, but if you wanted to have something that is not Copy as your value in there, you could run into the same problem, since there might be outstanding references into the map somewhere.

I agree that the HashMap would not be the idiomatic way to go here and still choose your first approach, even with i32 as the value type.

Edit:

As pointed out in the comments (thanks you!), there are two performance considerations for this situation. Firstly, if you really know, that the contained price is never zero, you can use std::num::NonZeroU16 and get the Option variant None for free (see documentation).

If you are dealing with a type that is Copy (e.g. i32), you should consider using Cell instead of RefCell, because it is lighter. For a more detailed comparison, see https://stackoverflow.com/a/30276150/13679671

In addition: If your bookstore *never* gives books away for free, you can use an `Option` instead of an `Option`. This will have the same memory layout as a `u64`, making the `Option` free. — user2722968, Sep 19 '20 at 10:54
Also note that if price is indeed just an integer (or other Copy type), `Cell` offers essentially zero runtime cost with the same benefits in this case. (LLVM or rustc might optimize it worse than no Cell, but certainly better than RefCell). — Mark Rousskov, Sep 19 '20 at 20:34

score 0 · Answer 2 · answered Sep 19 '20 at 21:05

Here are two more approaches.

Use Rc<RefCell<<Book>> everywhere, with price: Option<i32>> in the struct.
Declare a strict BookId(usize) and make a library: HashMap<BookId, Book>. Make all your references BookId and thus indirectly reference books through them everywhere you need to do so.

Data to be determined later: interior mutability or separate HashMap?

2 Answers2