I want to lazily consume the nodes of a file tree one by one while sorting the siblings on each level.
In Python, I'd use a synchronous generator:
def traverse_dst(src_dir, dst_root, dst_step):
"""
Recursively traverses the source directory and yields a sequence of (src, dst) pairs;
"""
dirs, files = list_dir_groom(src_dir) # Getting immediate offspring.
for d in dirs:
step = list(dst_step)
step.append(d.name)
yield from traverse_dst(d, dst_root, step)
for f in files:
dst_path = dst_root.joinpath(step)
yield f, dst_path
In Elixir, a (lazy) stream:
def traverse_flat_dst(src_dir, dst_root, dst_step \\ []) do
{dirs, files} = list_dir_groom(src_dir) # Getting immediate offspring.
traverse = fn d ->
step = dst_step ++ [Path.basename(d)]
traverse_flat_dst(d, dst_root, step)
end
handle = fn f ->
dst_path =
Path.join(
dst_root,
dst_step
)
{f, dst_path}
end
Stream.flat_map(dirs, traverse)
|> Stream.concat(Stream.map(files, handle))
end
One can see some language features addressing recursion: yield from
in Python, flat_map
in Elixir; the latter looks like a classic functional approach.
It looks like whatever is lazy in Rust, it's always an iterator. How am I supposed to do more or less the same in Rust?
I'd like to preserve the structure of my recursive function with dirs
and files
as vectors of paths (they are optionally sorted and filtered).
Getting dirs
and files
is already implemented to my liking:
fn folders(dir: &Path, folder: bool) -> Result<Vec<PathBuf>, io::Error> {
Ok(fs::read_dir(dir)?
.into_iter()
.filter(|r| r.is_ok())
.map(|r| r.unwrap().path())
.filter(|r| if folder { r.is_dir() } else { !r.is_dir() })
.collect())
}
fn list_dir_groom(dir: &Path) -> (Vec<PathBuf>, Vec<PathBuf>) {
let mut dirs = folders(dir, true).unwrap();
let mut files = folders(dir, false).unwrap();
if flag("x") {
dirs.sort_unstable();
files.sort_unstable();
} else {
sort_path_slice(&mut dirs);
sort_path_slice(&mut files);
}
if flag("r") {
dirs.reverse();
files.reverse();
}
(dirs, files)
}
Vec<PathBuf
can be iterated as is, and there is standard flat_map method. It should be possible to implement Elixir style, I just can't figure it out yet.
This is what I already have. Really working (traverse_flat_dst(&SRC, [].to_vec());
), I mean:
fn traverse_flat_dst(src_dir: &PathBuf, dst_step: Vec<PathBuf>) {
let (dirs, files) = list_dir_groom(src_dir);
for d in dirs.iter() {
let mut step = dst_step.clone();
step.push(PathBuf::from(d.file_name().unwrap()));
println!("d: {:?}; step: {:?}", d, step);
traverse_flat_dst(d, step);
}
for f in files.iter() {
println!("f: {:?}", f);
}
}
What I want (not yet working!):
fn traverse_flat_dst_iter(src_dir: &PathBuf, dst_step: Vec<PathBuf>) {
let (dirs, files) = list_dir_groom(src_dir);
let traverse = |d| {
let mut step = dst_step.clone();
step.push(PathBuf::from(d.file_name().unwrap()));
traverse_flat_dst_iter(d, step);
};
// This is something that I just wish to be true!
flat_map(dirs, traverse) + map(files)
}
I want this function to deliver one long flat iterator of files, in the spirit of the Elixir solution. I just can't yet cope with the necessary return types and other syntax. I really hope to be clear enough this time.
What I managed to compile and run (meaningless, but the signature is what I actually want):
fn traverse_flat_dst_iter(
src_dir: &PathBuf,
dst_step: Vec<PathBuf>,
) -> impl Iterator<Item = (PathBuf, PathBuf)> {
let (dirs, files) = list_dir_groom(src_dir);
let _traverse = |d: &PathBuf| {
let mut step = dst_step.clone();
step.push(PathBuf::from(d.file_name().unwrap()));
traverse_flat_dst_iter(d, step)
};
files.into_iter().map(|f| (f, PathBuf::new()))
}
What I'm still lacking:
fn traverse_flat_dst_iter(
src_dir: &PathBuf,
dst_step: Vec<PathBuf>,
) -> impl Iterator<Item = (PathBuf, PathBuf)> {
let (dirs, files) = list_dir_groom(src_dir);
let traverse = |d: &PathBuf| {
let mut step = dst_step.clone();
step.push(PathBuf::from(d.file_name().unwrap()));
traverse_flat_dst_iter(d, step)
};
// Here is a combination amounting to an iterator,
// which delivers a (PathBuf, PathBuf) tuple on each step.
// Flat mapping with traverse, of course (see Elixir solution).
// Iterator must be as long as the number of files in the tree.
// The lines below look very close, but every possible type is mismatched :(
dirs.into_iter().flat_map(traverse)
.chain(files.into_iter().map(|f| (f, PathBuf::new())))
}