32

Is there a way I can read a structure directly from a file in Rust? My code is:

use std::fs::File;

struct Configuration {
    item1: u8,
    item2: u16,
    item3: i32,
    item4: [char; 8],
}

fn main() {
    let file = File::open("config_file").unwrap();

    let mut config: Configuration;
    // How to read struct from file?
}

How would I read my configuration directly into config from the file? Is this even possible?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Jeroen
  • 15,257
  • 12
  • 59
  • 102
  • 1
    Which format do your file have? The correct answer depends on the actual data representation in the file quite strongly. – Vladimir Matveev Aug 20 '14 at 16:59
  • 3
    @VladimirMatveev Binary format, I don't want to read from the file and copy to my struct; I want to use my struct as a buffer to read the file with. – Jeroen Aug 20 '14 at 17:02
  • Ah, I understand now what you need. You can't do it without some unsafe code. I'll try to write proof of concept now. – Vladimir Matveev Aug 20 '14 at 17:03
  • This crate seems to do exactly what you want: https://github.com/TyOverby/bincode – onionjake May 02 '15 at 03:40

3 Answers3

16

Here you go:

use std::io::Read;
use std::mem;
use std::slice;

#[repr(C, packed)]
#[derive(Debug, Copy, Clone)]
struct Configuration {
    item1: u8,
    item2: u16,
    item3: i32,
    item4: [char; 8],
}

const CONFIG_DATA: &[u8] = &[
    0xfd, // u8
    0xb4, 0x50, // u16
    0x45, 0xcd, 0x3c, 0x15, // i32
    0x71, 0x3c, 0x87, 0xff, // char
    0xe8, 0x5d, 0x20, 0xe7, // char
    0x5f, 0x38, 0x05, 0x4a, // char
    0xc4, 0x58, 0x8f, 0xdc, // char
    0x67, 0x1d, 0xb4, 0x64, // char
    0xf2, 0xc5, 0x2c, 0x15, // char
    0xd8, 0x9a, 0xae, 0x23, // char
    0x7d, 0xce, 0x4b, 0xeb, // char
];

fn main() {
    let mut buffer = CONFIG_DATA;

    let mut config: Configuration = unsafe { mem::zeroed() };

    let config_size = mem::size_of::<Configuration>();
    unsafe {
        let config_slice = slice::from_raw_parts_mut(&mut config as *mut _ as *mut u8, config_size);
        // `read_exact()` comes from `Read` impl for `&[u8]`
        buffer.read_exact(config_slice).unwrap();
    }

    println!("Read structure: {:#?}", config);
}

Try it here (Updated for Rust 1.38)

You need to be careful, however, as unsafe code is, well, unsafe. After the slice::from_raw_parts_mut() invocation, there exist two mutable handles to the same data at the same time, which is a violation of Rust aliasing rules. Therefore you would want to keep the mutable slice created out of a structure for the shortest possible time. I also assume that you know about endianness issues - the code above is by no means portable, and will return different results if compiled and run on different kinds of machines (ARM vs x86, for example).

If you can choose the format and you want a compact binary one, consider using bincode. Otherwise, if you need e.g. to parse some pre-defined binary structure, byteorder crate is the way to go.

Vladimir Matveev
  • 120,085
  • 34
  • 287
  • 296
  • Yeah I'm aware about endian issues - but it's just a quick tool I'm writing which will run on about 3 computers. – Jeroen Aug 20 '14 at 17:44
  • 1
    @A.B., [this](https://github.com/rust-lang/rust/pull/16107), I believe. It is now located [here](http://doc.rust-lang.org/rbml/index.html). – Vladimir Matveev Aug 20 '14 at 22:35
  • 1
    I went with ´mem::uninitialized´ as opposed to `mem::zeroed` at the end. Doesn't seem to be much point initializing the memory to 0 if it's going to be overwritten anyway. – Jeroen Aug 23 '14 at 14:19
  • this gives me a "warning, this warning will become an error" message, https://github.com/rust-lang/rust/issues/46043 – don bright Jul 02 '18 at 01:18
  • While the general outline of this code is good, this specific instance **violates Rust's safety**. The values for the character data are not valid and exceed the currently supported boundaries of characters. – Shepmaster Nov 04 '19 at 20:27
  • Interesting. I don't remember at all how I came up with these values, but it seems unlikely I wrote them by hand... – Vladimir Matveev Nov 06 '19 at 00:44
  • It did seem strange. Miri caught this specific instance though. – Shepmaster Nov 06 '19 at 00:59
12

As Vladimir Matveev mentions, using the byteorder crate is often the best solution. This way, you account for endianness issues, don't have to deal with any unsafe code, or worry about alignment or padding:

use byteorder::{LittleEndian, ReadBytesExt}; // 1.2.7
use std::{
    fs::File,
    io::{self, Read},
};

struct Configuration {
    item1: u8,
    item2: u16,
    item3: i32,
}

impl Configuration {
    fn from_reader(mut rdr: impl Read) -> io::Result<Self> {
        let item1 = rdr.read_u8()?;
        let item2 = rdr.read_u16::<LittleEndian>()?;
        let item3 = rdr.read_i32::<LittleEndian>()?;

        Ok(Configuration {
            item1,
            item2,
            item3,
        })
    }
}

fn main() {
    let file = File::open("/dev/random").unwrap();

    let config = Configuration::from_reader(file);
    // How to read struct from file?
}

I've ignored the [char; 8] for a few reasons:

  1. Rust's char is a 32-bit type and it's unclear if your file has actual Unicode code points or C-style 8-bit values.
  2. You can't easily parse an array with byteorder, you have to parse N values and then build the array yourself.
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • I suppose these `read_u8` and other `read_X` calls may invoke a system call. So it may not be very efficient. Can we read a whole structure in a certain endianness instead of small portions of integer types? – VP. Nov 04 '19 at 20:04
  • 1
    @VictorPolevoy that is the job of a buffered reader to fix. See [What's the de-facto way of reading and writing files in Rust 1.x?](https://stackoverflow.com/a/31193386/155423), starting at "Buffered I/O". But yes, you can *unsafely* take any random blob of bytes and convert it to any given type. That's the point of the other two answers here. – Shepmaster Nov 04 '19 at 20:09
  • what if I want to read 10 GB file? the performance penalty will be high. Using from_raw_parts is the only way IMO. – mishmashru Aug 02 '20 at 20:16
  • 2
    @mishmashru I don't immediately see why this would have lower performance than `from_raw_parts`. This isn't something you need to have an *opinion* about. Write both and benchmark it — then you will know for sure. – Shepmaster Aug 03 '20 at 12:44
5

The following code does not take into account any endianness or padding issues and is intended to be used with POD types. struct Configuration should be safe in this case.


Here is a function that can read a struct (of a POD type) from a file:

use std::io::{self, Read};
use std::slice;

fn read_struct<T, R: Read>(mut read: R) -> io::Result<T> {
    let num_bytes = ::std::mem::size_of::<T>();
    unsafe {
        let mut s = ::std::mem::uninitialized();
        let buffer = slice::from_raw_parts_mut(&mut s as *mut T as *mut u8, num_bytes);
        match read.read_exact(buffer) {
            Ok(()) => Ok(s),
            Err(e) => {
                ::std::mem::forget(s);
                Err(e)
            }
        }
    }
}

// use
// read_struct::<Configuration>(reader)

If you want to read a sequence of structs from a file, you can execute read_struct multiple times or read all the file at once:

use std::fs::{self, File};
use std::io::BufReader;
use std::path::Path;

fn read_structs<T, P: AsRef<Path>>(path: P) -> io::Result<Vec<T>> {
    let path = path.as_ref();
    let struct_size = ::std::mem::size_of::<T>();
    let num_bytes = fs::metadata(path)?.len() as usize;
    let num_structs = num_bytes / struct_size;
    let mut reader = BufReader::new(File::open(path)?);
    let mut r = Vec::<T>::with_capacity(num_structs);
    unsafe {
        let buffer = slice::from_raw_parts_mut(r.as_mut_ptr() as *mut u8, num_bytes);
        reader.read_exact(buffer)?;
        r.set_len(num_structs);
    }
    Ok(r)
}

// use
// read_structs::<StructName, _>("path/to/file"))
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
malbarbo
  • 10,717
  • 1
  • 42
  • 57
  • why ::std::mem... instead of std::mem? is there any difference? – wingerse Jul 12 '16 at 20:59
  • 4
    A path starting with `::` is absolute. Using an absolute path will ensure that the code will compile if the function is put on a module. Search for absolute in https://doc.rust-lang.org/book/crates-and-modules.html to learn more. – malbarbo Jul 12 '16 at 21:15
  • 1
    Thank you malbarbo – wingerse Jul 12 '16 at 21:33
  • Why a ::std::mem::forget is needed here? Doesn't it indicates a memory leak? – knight42 Jul 13 '16 at 13:54
  • 1
    @Knight To prevent the destructor from running on `s` (`s` is uninitialized). This is one use case described on [`forget`](https://doc.rust-lang.org/stable/std/mem/fn.forget.html) documentation. – malbarbo Jul 13 '16 at 15:22
  • Sorry for my ignorance. Thanks! – knight42 Jul 13 '16 at 15:24
  • 1
    While this answer alludes to the underlying problem, it improperly uses unsafe Rust. The proposed function can **introduce memory unsafety in safe Rust code**. [One example shows it causing a segfault](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8f470d62af7c6eaa40c06cc392c0aed0). This code should *not* be used. – Shepmaster Nov 04 '19 at 20:19