1

PHP 5.5 times faster than Rust in the example below.

Am I doing something fundamentally wrong?

To me it seems that the regex engine in Rust is simply slower than it is in PHP.

PHP code:

$html = file_get_contents('/path_to/test.html');

global $c_id;
$c_id = 0;

echo 'len with comments: ', strlen($html), "\n";

$time_start = microtime(true);

$html = preg_replace_callback('/<!--(.*?)-->/s', function($matches) {
    global $c_id;
    $c_id++;
    return str_replace($matches[1], $c_id, $matches[0]);
}, $html);

echo (microtime(true) - $time_start), " seconds for removing comments.\n";

echo 'len without comments: ', strlen($html), "\n";

Rust code:

use regex::Regex;
use std::io::prelude::*;
use std::fs::File;

fn main() {
    let mut file = File::open("/path_to/test.html").expect("Unable to open the file");
    let mut html = String::new();
    file.read_to_string(&mut html).expect("Unable to read the file");
    let mut c_id: usize = 0;

    println!("len with comments: {}", html.len());

    let start = PreciseTime::now();

    let re = Regex::new(r"(?s)<!--(.*?)-->").unwrap();
    html = re.replace_all(&html, |captures: &regex::Captures| {
        c_id += 1;
        captures[0].replace(&captures[1], &c_id.to_string())
    }).to_string();

    println!("{} seconds for removing comments.", start.to(PreciseTime::now()));

    println!("len without comments: {}", html.len());
}

Rust dependencies:

regex = "1"
time = "*"

Results

PHP:

len with comments: 76968
0.00019717216491699 seconds for removing comments.
len without comments: 76622

Rust:

len with comments: 76968
PT0.001093365S seconds for removing comments.
len without comments: 76622

Thanks!

edo888
  • 398
  • 2
  • 12
  • Did you build/run the Rust program with `--release`? – Francis Gagné Oct 27 '19 at 23:45
  • `preg_replace_callback` is supposed to return the replacement string. Why aren't you just returning `""` instead of calling `str_replace`, which presumably ends up with the same result but in a more costly fashion? – Booboo Oct 28 '19 at 12:09
  • Yes, but it is an example specific case. If your regex is different, then that will not work. – edo888 Oct 28 '19 at 17:52
  • Yes, it is compiled with release mode. – edo888 Oct 28 '19 at 17:52
  • 1
    You haven't provided a reproducible benchmark, so it is effectively impossible to answer this question. Please provide all inputs, including your corpus. – BurntSushi5 Oct 28 '19 at 19:16

1 Answers1

0

The answer is to use pcre2 crate instead of regex crate in rust.

More info can be found here: https://users.rust-lang.org/t/rust-regex-replace-all-slower-than-php-regex-preg-replace-callback-how-to-optimize/34036/20

edo888
  • 398
  • 2
  • 12