1

I'm trying to optimise the speed of the algorithm. It's a sliding window of fixed size m over the long string of length n. So far it works in (n - m)m time, where m is taken for the construction of sliding window tmp of length m.

for i in range(0, n - m + 1):
  tmp = string[i:i + m]
  # then doing smth with tmp string

I wonder if I could reduce the time to n - m by using join, because each time I do a one-character shift and so all I need is to cut off the first character and to append the next one

tmp = string[:m]
for i in range(1, n - m + 1):
  tmp = ''.join([tmp[1:], string[i + m - 1]])
  # then doing smth with tmp string

The problem is I can't figure out how to perform tmp[1:] in a constant time.

Do you have any ideas, please?

tkrishtop
  • 734
  • 1
  • 8
  • 18
  • 1
    This https://stackoverflow.com/questions/35180377/time-complexity-of-string-slice might be closely related – Finomnis Jun 13 '19 at 15:13
  • No, your second algorithm will be a LOT smaller. Cutting of the first element of an array is generally an O(n) operation – Finomnis Jun 13 '19 at 15:14
  • 2
    Theoretically doing slices *can* be an O(1) operation. It's just not like that in python, by default. And it never will be, for strings, because strings are a lot more complex under the hood than it looks like from the outside. But for other array types, you have [memoryview](https://docs.python.org/3/library/stdtypes.html#memoryview), which can do list slices in zero time. – Finomnis Jun 13 '19 at 15:22
  • 1
    @Finomnis thank you for the link, it seems I have to use a conversion to bytes to have O(1) time for sliding window construction. – tkrishtop Jun 13 '19 at 15:22
  • 2
    Also, in my experience, if you do performance heavy tasks at which you are already at this point of optimizing, it is absolutely worth it to outsource the performance critical parts to an external programming language, preferable C++ or Rust. I already got ~300 times speedup from that. I even wrote a very simple and copy-paste ready example to achieve that: https://github.com/Finomnis/PythonCModule – Finomnis Jun 13 '19 at 15:25
  • 1
    I did some benchmarking, looks like I was wrong about memoryview... – Finomnis Jun 14 '19 at 10:52
  • @Finomnis yep, I've tested memoryview performance yesterday as well and it's really quite slow, don't know why. It was the way faster to work with the array ```[ord(ch) for ch in string]``` when with the memoryview ```memoryview(string.encode())``` – tkrishtop Jun 14 '19 at 12:03
  • Yah, weird, right? Well, as I said, if you really need some oompf behind it, go for external rust/cpp. It's worth it. – Finomnis Jun 14 '19 at 15:44
  • I personally prefer rust for everything since a couple of months, it's really hard to write critical bugs in rust. Basically the compiler won't allow anything that is really bad, and it's super easy to set up – Finomnis Jun 14 '19 at 15:45

1 Answers1

2

So I really wasn't sure about the results, so I ran a couple of benchmarks.

My test was to have a list of length 10000 and have a window of size 1000 slide over it. Then in every iteration, just sum all elements of the window, and then sum all results of all iterations.

Those are the results:

run_sliced:       296.210 ms/it (Iterations: 7)
run_memoryview:   409.670 ms/it (Iterations: 5)
run_shift:        296.495 ms/it (Iterations: 7)
run_cpp:            3.066 ms/it (Iterations: 653)
run_rust:           1.132 ms/it (Iterations: 1768)

The weirdest part of this benchmark is that memoryview was by far the slowest. Not entirely sure what to make of that.


Python:

import time
import array

# Our external c++ and rust functions
import ext_cpp
import ext_rust

# List of size 10000.
# Our moving window is size 1000. That means we need to slice 9000 times.
# We simply take the sum of all numbers in the window,
# and then again the sum of all windows.
# Just to prevent any optimizations from happening.

listSize = 10000
windowSize = 1000

a = list(range(listSize))
numWindows = listSize - windowSize + 1

def time_it(name, func):
    t0 = time.perf_counter()
    iterations = 0

    while True:
        assert(func(a) == 45000499500)
        iterations += 1
        t1 = time.perf_counter()
        if t1 - t0 > 2:
            break

    per_iteration = 1000.0*(t1-t0)/iterations
    print("{:<15}".format(name+":")
          + "{:>10}".format(format(per_iteration, '.3f')) + " ms/it"
          + " (Iterations: " + str(iterations) + ")")

def run_sliced(a):
    r = 0
    for i in range(numWindows):
        window = a[i:i+windowSize]
        for el in window:
            r += el
    return r

def run_memoryview(a):
    r = 0
    a_raw = array.array('l', a)
    a = memoryview(a_raw)
    for i in range(numWindows):
        window = a[i:i+windowSize]
        for el in window:
            r += el
    return r

def run_shift(a):
    r = 0    
    window = a[:windowSize]
    for el in window:
        r += el
    for i in range(1, numWindows):
        window = window[1:]
        window.append(a[i + windowSize - 1])
        for el in window:
            r += el
    return r

def run_cpp(a):
    return ext_cpp.run(a, numWindows, windowSize)

def run_rust(a):
    return ext_rust.run(a, numWindows, windowSize)


time_it('run_sliced', run_sliced)
time_it('run_memoryview', run_memoryview)
time_it('run_shift', run_shift)
time_it('run_cpp', run_cpp)
time_it('run_rust', run_rust)


C++ (pybind11):

long run(std::vector<long> a, long windowSize, long numWindows){
    long r = 0;
    for(long i = 0; i < numWindows; i++){
        const long* window = &a[i];
        for(long j = 0; j < windowSize; j++){
            r += window[j];
        }
    }
    return r;
}


Rust (PyO3):

fn run(_py: Python, a: Vec<u64>, num_windows: usize, window_size: usize)
    -> PyResult<u64>
{
    let mut r = 0;
    for i in 0..num_windows {
        let window = &a[i..i+window_size];
        for el in window {
            r += el; 
        }
    }
    Ok(r)
}
Finomnis
  • 18,094
  • 1
  • 20
  • 27