Rust function as slow as its python counterpart

Question

I am trying to speed up Python programs using Rust, a language in which I am a total beginner. I wrote a function that counts the occurrences of each possible string of length n within a larger string. For instance, if the main string is "AAAAT" and n=3, the outcome would be a hashmap {"AAA":2,"AAT":1}. I use pyo3 to call the Rust function from Python. The code of the Rust function is:

fn count_nmers(seq: &str, n: usize) -> PyResult<HashMap<&str,u64>> {
    let mut current_pos: usize = 0;
    let mut counts: HashMap<&str,u64> = HashMap::new();
    while current_pos+n <= seq.len() {
        //print!("{}\n", &seq[current_pos..current_pos+n]);
        match counts.get(&seq[current_pos..current_pos+n]) {
            Some(repeats) => counts.insert(&seq[current_pos..current_pos+n],repeats+1),
            None => counts.insert(&seq[current_pos..current_pos+n],1)
        };
        current_pos +=1;
    }
    //print!("{:?}",counts)
    Ok(counts)
}

When I use small values for n (n<10), Rust is about an order of magnitude faster than Python, but as the length of n increases, the gap tends to zero with both functions having the same speed by n=200. (see graph) Times to count for different n-mer lengths (Python black, rust red)

I must be doing something wrong with the strings, but I can't find the mistake.

The python code is:

def nmer_freq_table(sequence,nmer_length=6):
    nmer_dict=dict()
    for nmer in seq_win(sequence,window_size=nmer_length):
        if str(nmer) in nmer_dict.keys():
            nmer_dict[str(nmer)]=nmer_dict[str(nmer)]+1
        else:
            nmer_dict[str(nmer)]=1
    return nmer_dict

def seq_win(seq,window_size=2):
    length=len(seq)
    i=0
    while i+window_size <= length:
        yield seq[i:i+window_size]
        i+=1

Why do you expect Rust to be faster at that? And what are the Python codes? — Kelly Bundy, Aug 18 '22 at 15:10
Rust is a compiled language. Did you turn optimizations on and make sure you're in release mode? Benchmarking is meaningless in debug mode — Silvio Mayolo, Aug 18 '22 at 15:17
@KellyBundy Well, as much as I love Python, it is supposed to be slower. And for the small substrings it is indeed an order of magnitude slower. — Alberto Marin Sanguino, Aug 18 '22 at 15:18
Yes, I compiled using cargo compile --release. The difference with the debug-unoptimized version is not that big, though. — Alberto Marin Sanguino, Aug 18 '22 at 15:18
Two things: One, using the [`Entry` api](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.entry) should cut down on memory accesses, and two, your Rust code could panic on arbitrary utf-8 strings. And just because something is compiled doesn't make it magically faster. This is almost exclusively random memory accesses, I don't see why Rust should be significantly faster at that than Python. — isaactfa, Aug 18 '22 at 15:21
@isaactfa So the access to the HashMap is basically the bottleneck here and there is not much that we can do to speed it up... — Alberto Marin Sanguino, Aug 18 '22 at 15:23
But with Python, the actual work is still done in C (assuming you use CPython), and when you make n larger, the fraction of the time spent on interpreting Python shrinks. With sufficiently large n, you're essentially *only* measuring its C speed, not its Python speed. — Kelly Bundy, Aug 18 '22 at 15:26
@AlbertoMarinSanguino That seems to be more or less the case here. Have you profiled your code and determined this function to be a bottleneck? This task should be parallelizable quite nicely, maybe that could speed it up. — isaactfa, Aug 18 '22 at 15:30
What happens with n larger than 200? Does Python become *faster* or do both remain equally fast? — Kelly Bundy, Aug 18 '22 at 15:32
@KellyBundy Added longer n values to the benchmark. It seems the speedup is minimized exactly for the n i need :-) — Alberto Marin Sanguino, Aug 18 '22 at 15:51
The Python code could btw also be quite a bit better: [code](https://tio.run/##XY7BDoIwDEDv@4re2AIeDIkxJB6Mv@DNGIKjgyXQjVIOfj0iUWJ819fXNj6lDZQfI8@z49CDDV2HVnygEXwfAwtcwkSCrFSNDqhHLh3jUEr16FCPOExIFrNVdEiNtKeDKRQsMMrEBLW3oj9b9CrefMubhwJ8@tPftxkXGDx4Aq6oQb3o7aCBHfw0kMLerJ0xSkX2JPr/1@S8cE0yyI2Z5xc) — Kelly Bundy, Aug 18 '22 at 15:53
@KellyBundy You are absolutely right. I just have a lot of functions that work on a sliding window and never thought of optimizing them until now. With your answers and a few others, I start to see when to optimize my python code and when to go for a compiled language. — Alberto Marin Sanguino, Aug 18 '22 at 15:56
@isaactfa I am still at the "Hello world!" stage with Rust, parallelizing is why I want to learn Rust, but I am not there yet. — Alberto Marin Sanguino, Aug 18 '22 at 15:58
@AlbertoMarinSanguino Then optimization is the last thing I'd worry about. — isaactfa, Aug 18 '22 at 16:00
@isaactfa Sure, I was just surprised by the similar performance of both functions. Specially because, as Kelly said, the Python code was not written to be particularly fast. — Alberto Marin Sanguino, Aug 18 '22 at 16:05
Also, look here: https://stackoverflow.com/questions/70551997/faster-hashmap-for-sequential-keys. It behaves different if keys are sequential or — Evgeniy, Aug 18 '22 at 16:05
Python's performance increase over the n-mer length range of [350, 400] is interesting. — SargeATM, Aug 18 '22 at 16:05
@SargeATM Looks like it happened to be unusually slow at 350 once, maybe the machine was busy with other stuff for a moment. That bump doesn't appear in either Python version in the updated plot. — Kelly Bundy, Aug 18 '22 at 16:15
@SargeATM Yeah, I am only running things once for each plot. All functions seem to have more or less the same speed except for Kelly´s Python version that is consistently faster. — Alberto Marin Sanguino, Aug 18 '22 at 16:26
Btw I only converted the Counter to dict in order to match the original. If you're ok with a Counter, leaving that conversion out saves about 15% at n=200 in my test. — Kelly Bundy, Aug 18 '22 at 16:32

score 2 · Accepted Answer · edited Aug 18 '22 at 16:11

2

You are computing hash function multiple times, this may matter for large n values. Try using entry function instead of manual inserts:

while current_pos+n <= seq.len() {
    let en = counts.entry(&seq[current_pos..current_pos+n]).or_default();
    *en += 1;
    current_pos +=1;
}

Complete code here

Next, make sure you are running --release compiled code like cargo run --release.

And one more thing to take in mind is discussed here, Rust may use non-optimal hash function for your case which you can change.

And finally, on large data, most of time is spent in HashMap/dict internals which are not a python, but compiled code. So don't expect it to scale well.

edited Aug 18 '22 at 16:11

Cerberus

8,879
1
25
40

answered Aug 18 '22 at 15:23

Evgeniy

2,481
14
24

Also Python caches string hash values which would explain its shallower slope on the performance graph compared to Rust. "The performance of hash algorithm shouldn't affect general benchmarks since _hash value is cached inside string object_." https://bugs.python.org/issue29410 – SargeATM Aug 18 '22 at 16:01
Interestingly Python and Rust use the same hash algorithm, SipHash 1-3. – SargeATM Aug 18 '22 at 16:07

score 2 · Answer 2 · answered Aug 18 '22 at 15:35

2

Could it be because as n gets larger the number of iterations through the loop gets smaller?

Fewer iterations through the loop would reduce the performance gain seen by using Rust. I'm sure there is a small per function call performance cost for transition/marshaling to Rust from Python. This would explain how eventually the performance from pure Python and Python/Rust becomes the same.

answered Aug 18 '22 at 15:35

SargeATM

2,483
14
24

1

This seems the most likely to me. If you have a str of length `200` and an `n=200`, then both are going to do one iteration and any difference will vanish in the overhead. What happens when `l` (the `len(str)`) is very large? To do this analysis right, you should really be looking at speed compared to a ratio of `l:n`. – Nathaniel Ford Aug 18 '22 at 15:39
1

Not in this case. The difference between the main string of about 4 million characters and the substrings makes the reduction in number of loops negligible. – Alberto Marin Sanguino Aug 18 '22 at 15:42
@Evgeniy answer is a better answer after knowing the string sizes. – SargeATM Aug 18 '22 at 16:02

Rust function as slow as its python counterpart

2 Answers2