3

I am using the memmap2 crate to read some large binary files, and I am using the midasio library which provides some "viewer" structures that just reference inner structures in the byte slice.

From a slice of bytes (the memory map) I can create a FileView, with which I can iterate over EventViews, with which I can iterate over BankViews. All these just reference to the underlying memory mapped slice.

It usually is trivial to iterate through the BankViews in a set of files as:

Minimal working example:

Cargo.toml

[dependencies]
midasio = "0.3"
memmap2 = "0.5"

and main.rs

use std::path::PathBuf;
use std::fs::File;
use memmap2::Mmap;
use midasio::read::file::FileView;

fn main() {
    let args: Vec<PathBuf> = Vec::new(); // Just the name of some files
    for path in args {
        let file = File::open(path).unwrap();
        let mmap = unsafe { Mmap::map(&file).unwrap() };

        let file_view = FileView::try_from(&mmap[..]).unwrap();
        for event_view in &file_view {
            for _bank_view in &event_view {
                // Here I am just iterating through all the BankViews
            }
        }
    }
}

I need to "flatten" all these into a single iterator such that whenever I call next() it has the exact same behavior as the nested loop above. How can I do this?

I need to do it because I want to use the Cursive library and loop through BankViews by pressing a "next" button. So I need to control each "next" with a single function that, hopefully, just calls next on the massive iterator.

I tried

use std::path::PathBuf;
use std::fs::File;
use memmap2::Mmap;
use midasio::read::file::FileView;

fn main() {
    let args: Vec<PathBuf> = Vec::new();
    let iterator = args
        .iter()
        .map(|path| {
            let file = File::open(path).unwrap();
            let mmap = unsafe { Mmap::map(&file).unwrap() };

            FileView::try_from(&mmap[..]).unwrap()
        })
        .flat_map(|file_view| file_view.into_iter())
        .flat_map(|event_view| event_view.into_iter());
}

And this gives me the errors:

error[E0515]: cannot return value referencing local variable `mmap`
  --> src/main.rs:14:13
   |
14 |             FileView::try_from(&mmap[..]).unwrap()
   |             ^^^^^^^^^^^^^^^^^^^^----^^^^^^^^^^^^^^
   |             |                   |
   |             |                   `mmap` is borrowed here
   |             returns a value referencing data owned by the current function

error[E0515]: cannot return reference to function parameter `file_view`
  --> src/main.rs:16:31
   |
16 |         .flat_map(|file_view| file_view.into_iter())
   |                               ^^^^^^^^^^^^^^^^^^^^^ returns a reference to data owned by the current function

error[E0515]: cannot return reference to function parameter `event_view`
  --> src/main.rs:17:32
   |
17 |         .flat_map(|event_view| event_view.into_iter());
   |                                ^^^^^^^^^^^^^^^^^^^^^^ returns a reference to data owned by the current function

For more information about this error, try `rustc --explain E0515`.
error: could not compile `ugly_iteration` due to 3 previous errors
user17004502
  • 105
  • 1
  • 6

1 Answers1

2

This is problematic. Because the IntoIterator impls borrow self you need to hold both the iterable and the iterator together, and that creates a self-referential struct. See Why can't I store a value and a reference to that value in the same struct?.

It looks to me, even though I haven't digged deep, that this is not necessary and this is actually the result of a wrong design of midasio. But you can't do much regarding that, other than patching the library or sending a PR and hoping for it to be accepted soon (if you want to change that, I think it is enough to change the &'a FileView<'a> and &'a EventView<'a> to &'_ FileView<'a> and &'_ EventView<'a> respectively, though I'm unsure).https://github.com/DJDuque/midasio/pull/8

I don't think there is a good solution. Using iterator adapters is unlikely to work, and creating your own iterator type will require unsafe code or at the very least using a crate like ouroboros.


Edit: With my PR #8, it still doesn't work verbatim because the Mmap is dropped at the end of the map() but you still need to access it, however this is fixable pretty easily by collecting all Mmaps into a Vec:

fn main() {
    let args: Vec<PathBuf> = Vec::new();
    let mmaps = args
        .iter()
        .map(|path| {
            let file = File::open(path).unwrap();
            unsafe { Mmap::map(&file).unwrap() }
            
        })
        .collect::<Vec<_>>();
    let iterator = mmaps
        .iter()
        .map(|mmap| FileView::try_from(&mmap[..]).unwrap())
        .flat_map(|file_view| file_view.into_iter())
        .flat_map(|event_view| event_view.into_iter());
}

Returning this iterator from a function is still not going to work, unfortunately.

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
  • I am the author of `midasio` so I am definitely interested in fixing any wrong design early before I start using the library more and more. I still am extremely new to Rust; and the fact that I cannot flatten the `EventViews` smelled. All these "Viewers" reference a common outer slice that always out-lives them; I just currently lack the current knowledge to express this properly. If you have time, could you have a look at the repo: https://github.com/DJDuque/midasio Any suggestions on how to fix things are welcome. I will start learning more on lifetimes and self-referencing structures. – user17004502 Jun 19 '22 at 04:35
  • @user17004502 I don't have time now, maybe later; but did you try to apply the change I suggested? It basically says "use the original slice's lifetime instead of the one of the viewer instance". – Chayim Friedman Jun 19 '22 at 04:37
  • Yes, I tried your suggestion but I got a bunch of lifetime errors. I *think* this is because my inner slices still are declared with the wrong lifetime. Maybe changing these in the constructor to use "the original slice's life time" will fix things. I am going to keep investigating; you have given me very valuable information. Thanks. – user17004502 Jun 19 '22 at 04:43
  • @user17004502 Sent a PR and updated the answer accodingly. – Chayim Friedman Jun 19 '22 at 06:08
  • Thanks. This is great! I appreciate your help a lot. With respect to using the iterator from the "next" function in a button; I don't think I need to return the iterator, maybe just take a `&mut` and advance it with the button. Cursive also has this `user_data` where I can probably keep the `Mmaps` and the iterator. This is a huge step. Thanks again. – user17004502 Jun 19 '22 at 07:06