3

I used function mmax to calculate moving max of a 10-million-length integer vector. I ran it 10 times to calculate the total execution time. The running time for window size 132 (15,025 milliseconds) is 6 times longer than for window size 22 (2,425 milliseconds). It seems the complexity of mmax is O(nw) rather than O(n), where w is the length of the sliding window).

To check if this is true for other similar products, I tried the same experiment on DolphinDB, a time series database with built-in analytics features (https://www.dolphindb.com/downloads.html ). In contrast, DolphinDB’s mmax has linear complexity O(n), regardless of the window size: 1,277 milliseconds (window size 132) and 1,233 milliseconds (window size 22).

The hardware being used for this experiment:

Server: Dell PowerEdge R630
Architecure: x86_64
CPU Model Name: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Total logical CPU cores: 48
Total memory: 256G

Experiment setup

I used KDB+ 4.0 64 bit version and DolphinDB_Linux_V2.00.7(DolphinDB community version: 2 cores and 8GB memory). Both experiments are conducted using 2 cores of CPU.

KDB implementation

// Start the server
rlwrap -r taskset -c 0,1 ./l64/q -p 5002 -s 2

// The code
a:10000000?10000i
\t do[10; 22 mmax a]
2425
\t do[10; 132 mmax a]
15025

DolphinDB implementation

// Start the server
rlwrap -r ./dolphindb -localSite localhost:5002:local5002 -localExecutors 1

// The code
a=rand(10000,10000000)
timer(10) mmax(a,22);
1232.83 ms

timer(10) mmax(a,132);
1276.53 ms

Can any kdb expert confirm the complexity of function mmax? If the built-in function mmax does have the complexity of O(nw), any third-party plugin for kdb to improve the performance?

DanielSmith
  • 101
  • 4

1 Answers1

7

Yes, it would scale with the size of the window as well as the size of the list. If you look at the definition of mmax:

q)mmax
k){(x-1)|':/y}

it is "equivalent" to

q)a:8 1 9 5 4 6 6 1 8 5
q)w:3
q)mmax[w;a]~{{max x,y}':[x]}/[w-1;a]
1b

which can more clearly be understood as the last output of:

q){{max x,y}':[x]}\[w-1;a]
8 1 9 5 4 6 6 1 8 5
8 8 9 9 5 6 6 6 8 8
8 8 9 9 9 6 6 6 8 8

....take the max of each item with its previous item, {max x,y}':[x]

....then do that same operation again on the output, {}\

....do the same operation again on the output (w-1) times, \[w-1;a]

From that it's clear that the window size impacts the number of times the loop is performed. As for faster implementation, there might be a different but less "elegant" algorithm which does it quicker and could be written in k/q. Otherwise you could import an implementation written in C - see https://code.kx.com/q/ref/dynamic-load/

terrylynch
  • 11,844
  • 13
  • 21