I can see a huge difference in terms of performance, between an iterator-based algorithm (slow) and a procedural algorithm (fast). I want to improve the speed of the iterator-based algorithm but I don't know how to do it.
Context
I need to convert strings like 12345678
to 12_345_678
. Questions like this have already been asked like in this question. (Note: I don't want to use a crate for this, because I have some custom needs about it).
Edit: it is assumed that the input string is ASCII.
I wrote two versions of the algorithm:
- an iterator-based version, but I'm not satisfied because there is two
.collect()
, which means many allocations (at least one.collect()
is extra, without being able to remove it, see the code below); - a procedural version of it.
Then I compared the execution time of the two versions: my iterator version is about ~8x slower compared to the procedural version.
Benchmark code
use itertools::Itertools; // Using itertools v0.10.0
/// Size of the string blocks to separate:
const BLOCK_SIZE: usize = 3;
/// Procedural version.
fn procedural(input: &str) -> String {
let len = input.len();
let nb_complete_blocks = (len as isize - 1).max(0) as usize / BLOCK_SIZE;
let first_block_len = len - nb_complete_blocks * BLOCK_SIZE;
let capacity = first_block_len + nb_complete_blocks * (BLOCK_SIZE + 1);
let mut output = String::with_capacity(capacity);
output.push_str(&input[..first_block_len]);
for i in 0..nb_complete_blocks {
output.push('_');
let start = first_block_len + i * BLOCK_SIZE;
output.push_str(&input[start..start + BLOCK_SIZE]);
}
output
}
/// Iterator version.
fn with_iterators(input: &str) -> String {
input.chars()
.rev()
.chunks(BLOCK_SIZE)
.into_iter()
.map(|c| c.collect::<String>())
.join("_")
.chars()
.rev()
.collect::<String>()
}
fn main() {
let input = "12345678";
macro_rules! bench {
( $version:ident ) => {{
let now = std::time::Instant::now();
for _ in 0..1_000_000 {
$version(input);
}
println!("{:.2?}", now.elapsed());
}};
}
print!("Procedural benchmark: ");
bench!(procedural);
print!("Iterator benchmark: ");
bench!(with_iterators);
}
Typical benchmark result
Procedural benchmark: 17.07ms
Iterator benchmark: 240.19ms
Question
How can I improve the iterator-based version, in order to reach the performance of the procedural version ?