0

I want to lazily consume the nodes of a file tree one by one while sorting the siblings on each level.

In Python, I'd use a synchronous generator:

def traverse_dst(src_dir, dst_root, dst_step):
    """
    Recursively traverses the source directory and yields a sequence of (src, dst) pairs;

    """
    dirs, files = list_dir_groom(src_dir) # Getting immediate offspring.

    for d in dirs:
        step = list(dst_step)
        step.append(d.name)
        yield from traverse_dst(d, dst_root, step)

    for f in files:
        dst_path = dst_root.joinpath(step)
        yield f, dst_path

In Elixir, a (lazy) stream:

def traverse_flat_dst(src_dir, dst_root, dst_step \\ []) do
  {dirs, files} = list_dir_groom(src_dir) # Getting immediate offspring.

  traverse = fn d ->
    step = dst_step ++ [Path.basename(d)]
    traverse_flat_dst(d, dst_root, step)
  end

  handle = fn f ->
    dst_path =
      Path.join(
        dst_root,
        dst_step
      )

    {f, dst_path}
  end

  Stream.flat_map(dirs, traverse)
  |> Stream.concat(Stream.map(files, handle))
end

One can see some language features addressing recursion: yield from in Python, flat_map in Elixir; the latter looks like a classic functional approach.

It looks like whatever is lazy in Rust, it's always an iterator. How am I supposed to do more or less the same in Rust?

I'd like to preserve the structure of my recursive function with dirs and files as vectors of paths (they are optionally sorted and filtered).

Getting dirs and files is already implemented to my liking:

fn folders(dir: &Path, folder: bool) -> Result<Vec<PathBuf>, io::Error> {
    Ok(fs::read_dir(dir)?
        .into_iter()
        .filter(|r| r.is_ok())
        .map(|r| r.unwrap().path())
        .filter(|r| if folder { r.is_dir() } else { !r.is_dir() })
        .collect())
}

fn list_dir_groom(dir: &Path) -> (Vec<PathBuf>, Vec<PathBuf>) {
    let mut dirs = folders(dir, true).unwrap();
    let mut files = folders(dir, false).unwrap();
    if flag("x") {
        dirs.sort_unstable();
        files.sort_unstable();
    } else {
        sort_path_slice(&mut dirs);
        sort_path_slice(&mut files);
    }
    if flag("r") {
        dirs.reverse();
        files.reverse();
    }
    (dirs, files)
}

Vec<PathBuf can be iterated as is, and there is standard flat_map method. It should be possible to implement Elixir style, I just can't figure it out yet.

This is what I already have. Really working (traverse_flat_dst(&SRC, [].to_vec());), I mean:

fn traverse_flat_dst(src_dir: &PathBuf, dst_step: Vec<PathBuf>) {
    let (dirs, files) = list_dir_groom(src_dir);

    for d in dirs.iter() {
        let mut step = dst_step.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        println!("d: {:?}; step: {:?}", d, step);
        traverse_flat_dst(d, step);
    }
    for f in files.iter() {
        println!("f: {:?}", f);
    }
}

What I want (not yet working!):

fn traverse_flat_dst_iter(src_dir: &PathBuf, dst_step: Vec<PathBuf>) {
    let (dirs, files) = list_dir_groom(src_dir);

    let traverse = |d| {
        let mut step = dst_step.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        traverse_flat_dst_iter(d, step);

    };
    // This is something that I just wish to be true!
    flat_map(dirs, traverse) + map(files)    
}

I want this function to deliver one long flat iterator of files, in the spirit of the Elixir solution. I just can't yet cope with the necessary return types and other syntax. I really hope to be clear enough this time.

What I managed to compile and run (meaningless, but the signature is what I actually want):

fn traverse_flat_dst_iter(
    src_dir: &PathBuf,
    dst_step: Vec<PathBuf>,
) -> impl Iterator<Item = (PathBuf, PathBuf)> {
    let (dirs, files) = list_dir_groom(src_dir);

    let _traverse = |d: &PathBuf| {
        let mut step = dst_step.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        traverse_flat_dst_iter(d, step)
    };
    files.into_iter().map(|f| (f, PathBuf::new()))
}

What I'm still lacking:

fn traverse_flat_dst_iter(
    src_dir: &PathBuf,
    dst_step: Vec<PathBuf>,
) -> impl Iterator<Item = (PathBuf, PathBuf)> {
    let (dirs, files) = list_dir_groom(src_dir);

    let traverse = |d: &PathBuf| {
        let mut step = dst_step.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        traverse_flat_dst_iter(d, step)
    };
    // Here is a combination amounting to an iterator,
    // which delivers a (PathBuf, PathBuf) tuple on each step.
    // Flat mapping with traverse, of course (see Elixir solution).
    // Iterator must be as long as the number of files in the tree.
    // The lines below look very close, but every possible type is mismatched :(
    dirs.into_iter().flat_map(traverse)
        .chain(files.into_iter().map(|f| (f, PathBuf::new())))

}
Alexey Orlov
  • 2,412
  • 3
  • 27
  • 46
  • Please add an [MCVE]. Without any Rust code, it's hard to answer. – Boiethios Sep 24 '19 at 08:58
  • This file iterator is open-source: https://docs.rs/walkdir/2.2.9/walkdir/ – Denys Séguret Sep 24 '19 at 09:00
  • [This](https://github.com/BurntSushi/walkdir/blob/master/src/lib.rs) is the source code? The essence is drowned in options, as always with the real life code. – Alexey Orlov Sep 24 '19 at 09:03
  • @FrenchBoiethios: I have to think up some approach first. Right now I'm starting to suspect that in Rust one has to abandon the idea of recursive calls when it comes to iterators. Is it really so? – Alexey Orlov Sep 24 '19 at 09:31
  • 1
    Why make recursive calls when a stack is so easy to use and so efficient (and can also be easily consumed by a tread pool) ? – Denys Séguret Sep 24 '19 at 10:21
  • First off, "To iterate is human, to recurse divine"... Back to business: do you really mean, I absolutely *have to* use stack in my case? You see, I still don't know. – Alexey Orlov Sep 24 '19 at 10:47
  • 1
    Careful with such sayings, @AlexeyOrlov. Recursion is easier on the eyes, not on the performance, unless the language and/or platform explicitly provides for it. Rust does not. – Sébastien Renauld Sep 24 '19 at 12:58
  • @SébastienRenauld You are right, generally. Walking a tree is a perfectly legitimate application for non-tail-optimized recursion, though. – Alexey Orlov Sep 24 '19 at 12:59
  • It looks like your question might be answered by the answers of [Implement IntoIterator for binary tree](https://stackoverflow.com/q/43833588/155423) or [How do I flatten a recursive structure using recursive iterators?](https://stackoverflow.com/q/57573212/155423); [How to correctly iterate all records of a multi-level depth structure in Rust?](https://stackoverflow.com/q/47797743/155423). If not, please **[edit]** your question to explain the differences. Otherwise, we can mark this question as already answered. – Shepmaster Sep 24 '19 at 13:42
  • @Shepmaster My problem is a bit different. I will edit my question once more as soon as possible. – Alexey Orlov Sep 24 '19 at 13:54
  • @Shepmaster Edited as suggested. – Alexey Orlov Sep 24 '19 at 16:11
  • Please, take a look at my latest update. It's probably very, very close. Still no cigar. Now it should be crystal clear what I actually want. – Alexey Orlov Sep 25 '19 at 13:08

2 Answers2

4

There are two approaches:

The first one is to use an existing crate, like walkdir. The benefit is it's being well tested and offers many options.

The second one is to write your own implementation of Iterator. Here's an example, and maybe the basis for your own:

struct FileIterator {
    dirs: Vec<PathBuf>, // the dirs waiting to be read
    files: Option<ReadDir>, // non recursive iterator over the currently read dir
}

impl From<&str> for FileIterator {
    fn from(path: &str) -> Self {
        FileIterator {
            dirs: vec![PathBuf::from(path)],
            files: None,
        }
    }
}

impl Iterator for FileIterator {
    type Item = PathBuf;
    fn next(&mut self) -> Option<PathBuf> {
        loop {
            while let Some(read_dir) = &mut self.files {
                match read_dir.next() {
                    Some(Ok(entry)) => {
                        let path = entry.path();
                        if let Ok(md) = entry.metadata() {
                            if md.is_dir() {
                                self.dirs.push(path.clone());
                                continue;
                            }
                        }
                        return Some(path);
                    }
                    None => { // we consumed this directory
                        self.files = None;
                        break; 
                    }
                    _ => { }
                }
            }
            while let Some(dir) = self.dirs.pop() {
                let read_dir = fs::read_dir(&dir);
                if let Ok(files) = read_dir {
                    self.files = Some(files);
                    return Some(dir);
                }
            }
            break; // no more files, no more dirs
        }
        return None;
    }
}

playground

The advantage of writing your own iterator is that you'll tune it for your precise needs (sorting, filtering, error handling, etc.). But you'll have to deal with your own bugs.

Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
  • Thanks! I'll spend some time studying it. "Why recursion" is self-explanatory: this does not look nearly as nice and readable as the Python/Elixir solution. – Alexey Orlov Sep 24 '19 at 11:38
  • @AlexeyOrlov Using walk_dir would be two lines. Be careful that in real life you have to deal with unexpected problems, like permissions, invalid links, inodes, etc. As a bonus here's how I compute (in parallel) the sum of sizes of a dir : https://github.com/Canop/broot/blob/master/src/file_sizes/file_sizes_unix.rs#L13 (I basically use a channel in place of a stack) – Denys Séguret Sep 24 '19 at 11:42
  • 1
    Last thing: I just wrote the FileIterator right now for your question. It's not well tested and could probably be more elegant – Denys Séguret Sep 24 '19 at 11:44
  • Thank you again! My task is rather specific. Permission problems are not very likely, but sorting, filtering and forward/backward moving should go smoothly. – Alexey Orlov Sep 24 '19 at 11:58
  • @AlexeyOrlov In order to sort, you could use replace `files: Option` with `files: Option>` and build it by first storing the entries into a vector then sorting them. There's no `read_dir` which directly allows sorting. – Denys Séguret Sep 24 '19 at 12:14
  • Meanwhile I did some further research. The Elixir style solution should be feasible. `Vec` (`files`) can be iterated as is; and there is the same [flat_map](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.flat_map) function. I just have to figure out how to combine it all Rust fashion. – Alexey Orlov Sep 24 '19 at 12:32
1

This is the exact solution I sought. It's none of my achievement; see here. Comments are welcome.

fn traverse_flat_dst_iter(
    src_dir: &PathBuf,
    dst_step: Vec<PathBuf>,
) -> impl Iterator<Item = (PathBuf, PathBuf)> {
    let (dirs, files) = list_dir_groom(src_dir);

    let traverse = move |d: PathBuf| -> Box<dyn Iterator<Item = (PathBuf, PathBuf)>> {
        let mut step = dst_step.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        Box::new(traverse_flat_dst_iter(&d, step))
    };
    dirs.into_iter()
        .flat_map(traverse)
        .chain(files.into_iter().map(|f| (f, PathBuf::new())))
}

Another, more sophisticated take. One has to box things, clone parameters to be shared between lambdas, etc., to satisfy the compiler. Yet it works. Hopefully, on can get the hang of the thing.

fn traverse_dir(
    src_dir: &PathBuf,
    dst_step: Vec<PathBuf>,
) -> Box<dyn Iterator<Item = (PathBuf, Vec<PathBuf>)>> {
    let (dirs, files) = groom(src_dir);
    let destination_step = dst_step.clone(); // A clone for handle.

    let traverse = move |d: PathBuf| {
        let mut step = dst_step.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        traverse_dir(&d, step)
    };
    let handle = move |f: PathBuf| (f, destination_step.clone());
    if flag("r") {
        // Chaining backwards.
        Box::new(
            files
                .into_iter()
                .map(handle)
                .chain(dirs.into_iter().flat_map(traverse)),
        )
    } else {
        Box::new(
            dirs.into_iter()
                .flat_map(traverse)
                .chain(files.into_iter().map(handle)),
        )
    }
}
Alexey Orlov
  • 2,412
  • 3
  • 27
  • 46