1

In this code, A does not need to be static mut, but the compiler forces B to be static mut:

use std::collections::HashMap;
use std::iter::FromIterator;

static A: [u32; 21] = [
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
];
static mut B: Option<HashMap<u32, String>> = None;

fn init_tables() {
    let hm = HashMap::<u32, String>::from_iter(A.iter().map(|&i| (i, (i + 10u32).to_string())));
    unsafe {
        B = Some(hm);
    }
}

fn main() {
    init_tables();
    println!("{:?} len: {}", A, A.len());
    unsafe {
        println!("{:?}", B);
    }
}

This is the only way I have found to get close to what I actually want: a global, immutable HashMap to be used by several functions, without littering all my code with unsafe blocks.

I know that a global variable is a bad idea for multi-threaded applications, but mine is single threaded, so why should I pay the price for an eventuality which will never arise?

Since I use rustc directly and not cargo, I don't want the "help" of extern crates like lazy_static. I tried to decypher what the macro in that package does, but to no end.

I also tried to write this with thread_local() and a RefCell but I had trouble using A to initialize B with that version.

In more general terms, the question could be "How to get stuff into the initvars section of a program in Rust?"

If you can show me how to initialize B directly (without a function like init_tables()), your answer is probably right.

If a function like init_tables() is inevitable, is there a trick like an accessor function to reduce the unsafe litter in my program?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
BitTickler
  • 10,905
  • 5
  • 32
  • 53
  • "I know that a global variable is a bad idea for multi threaded applications.", why a **immutable** global would be a bad idea, it's perfectly ok. – Stargateur Jan 01 '20 at 10:02
  • That say I don't see the point to use a hashmap for a associated thing that never change. – Stargateur Jan 01 '20 at 10:05
  • @Stargateur The point is to have faster look ups. This is the minimal example. In the application I have several maps, one for each kind of lookup I need to eventually get the index into A. I know there is also a ``phf`` crate and in theory, having a perfect hash function for each kind of lookup would be even faster. But I don't want external crates. – BitTickler Jan 01 '20 at 10:09
  • It looks like your question might be answered by the answers of [How can you make a safe static singleton in Rust?](https://stackoverflow.com/q/27221504/155423). If not, please **[edit]** your question to explain the differences. Otherwise, we can mark this question as already answered. – Shepmaster Jan 01 '20 at 20:04
  • *I don't want the "help" of extern crates* — you are setting yourself up for failure. It is highly recommended that you not continue down that path. – Shepmaster Jan 01 '20 at 20:05

1 Answers1

2

How to get stuff into the initvars section of a program in Rust?

Turns out rustc puts static data in .rodata section and static mut data in .data section of the generated binary:

#[no_mangle]
static DATA: std::ops::Range<u32> = 0..20;

fn main() { DATA.len(); }
$ rustc static.rs
$ objdump -t -j .rodata static
static:     file format elf64-x86-64

SYMBOL TABLE:
0000000000025000 l    d  .rodata    0000000000000000              .rodata
0000000000025490 l     O .rodata    0000000000000039              str.0
0000000000026a70 l     O .rodata    0000000000000400              elf_crc32.crc32_table
0000000000026870 l     O .rodata    0000000000000200              elf_zlib_default_dist_table
0000000000026590 l     O .rodata    00000000000002e0              elf_zlib_default_table
0000000000025060 g     O .rodata    0000000000000008              DATA
0000000000027f2c g     O .rodata    0000000000000100              _ZN4core3str15UTF8_CHAR_WIDTH17h6f9f810be98aa5f2E

So changing from static mut to static at the source code level significantly changes the binary generated. The .rodata section is read-only and trying to write to it will seg fault the program.

If init_tables() is of the judgement day category (inevitable)

It is probably inevitable. Since the default .rodata linkage won't work, one has to control it directly:

use std::collections::HashMap;
use std::iter::FromIterator;

static A: std::ops::Range<u32> = 0..20;
#[link_section = ".bss"]
static B: Option<HashMap<u32, String>> = None;

fn init_tables() {
    let data = HashMap::from_iter(A.clone().map(|i| (i, (i + 10).to_string())));
    unsafe {
        let b: *mut Option<HashMap<u32, String>> = &B as *const _ as *mut _;
        (&mut *b).replace(data);
    }
}

fn main() {
    init_tables();
    println!("{:?} len: {}", A, A.len());
    println!("{:#?} 5 => {:?}", B, B.as_ref().unwrap().get(&5));
}

I don't want the "help" of extern crates like lazy_static

Actually lazy_static isn't that complicated. It has some clever use of the Deref trait. Here is a much simplified standalone version and it is more ergonomically friendly than the first example:

use std::collections::HashMap;
use std::iter::FromIterator;
use std::ops::Deref;
use std::sync::Once;

static A: std::ops::Range<u32> = 0..20;
static B: BImpl = BImpl;
struct BImpl;
impl Deref for BImpl {
    type Target = HashMap<u32, String>;

    #[inline(always)]
    fn deref(&self) -> &Self::Target {
        static LAZY: (Option<HashMap<u32, String>>, Once) = (None, Once::new());
        LAZY.1.call_once(|| unsafe {
            let x: *mut Option<Self::Target> = &LAZY.0 as *const _ as *mut _;
            (&mut *x).replace(init_tables());
        });

        LAZY.0.as_ref().unwrap()
    }
}

fn init_tables() -> HashMap<u32, String> {
    HashMap::from_iter(A.clone().map(|i| (i, (i + 10).to_string())))
}

fn main() {
    println!("{:?} len: {}", A, A.len());
    println!("{:#?} 5 => {:?}", *B, B.get(&5));
}
edwardw
  • 12,652
  • 3
  • 40
  • 51
  • what is that ``Lazy.1`` and ``Lazy.0`` syntax? Guess that is where I gave up trying to read lazy static crate. – BitTickler Jan 02 '20 at 01:01
  • @BitTickler they are the first and second element of a tuple respectively. Works for any tuple. – edwardw Jan 02 '20 at 04:01