An algorithm to find L th min/max from length K subarray

Question

Consider an array of values {3,2,8,4,7,6,1}. I need an algorithm to find the L-th max/min in each length K subarray.

For example, for the above array, let K = 5, L = 3. I need to find the 3rd minimum of each length K subarray.

Expected output:

Subarray:
[3,2,8,4,7] = Third minimun is 4.
[2,8,4,7,6] = Third minimum is 6.
[8,4,7,6,1] = Third mininum is 6.

Kindly help me find the algorithm.

I initially though of self balancing BST but it would be O(L) when it comes to finding Lth min/max in it's inorder traversal. So, this might be a combination of min heap and max heap for a window of size k where we keep L-1 elements in the min heap and L-K elements in the max heap. So, getting Lth min/max would be O(1). — nice_dev, Nov 24 '19 at 06:56
@vivek_23 Turn a BST into an order statistic tree by augmenting each node with the size of the subtree rooted at that node. Then, find order statistics in O(log k) — Dave, Nov 24 '19 at 08:56
@vivek_23 when using min heap and max heap, when to add a new element to the heap and when to extract the min and max from the heap. can you please share me the pseudocode? — Raj, Nov 24 '19 at 15:27
@Raj Lookup online on how to find running median of a stream of integers. Your problem looks similar to that. — nice_dev, Nov 24 '19 at 16:37

score 1 · Answer 1 · edited Nov 24 '19 at 09:01

You can solve this in O(N log K) time where N is size of array.

First notice that if we solve problem for min, the solution for max is same, just multiply original array elements by -1.

So to solve this we need a data structure that supports: additions, deletions, finding k-th item in collection, then we can just loop over all needed intervals using sliding window, adding and deleting items when needed.

Actually there are lots of structures that can support all those operations in O(log N) time, you can use segment tree, min/max heaps, order statistic tree (a type of B-tree) and some others.

An order statistic tree is a binary search tree that supports two additional operations beyond insertion, lookup and deletion:

Select(i) — find the i'th smallest element stored in the tree.

Rank(x) – find the rank of element x in the tree, i.e. its index in the sorted list of elements of the tree.

Both are O(log n) when a self-balancing tree is used as the base data structure.

Sample code (C++ with g++ compiler which luckily has order statistic tree in the library)

#include <bits/stdc++.h>
using namespace std;

#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
typedef tree<pair<int,int>,null_type,less<pair<int,int>>,rb_tree_tag,tree_order_statistics_node_update>ordered_set;

int main()
{
    int n,k,L;
    cin>>n>>k>>L;
    vector<int>A(n+1);
    for(int i=0; i<n; i++)cin>>A[i];
    ordered_set S;
    for(int i=0; i<k; i++)
        S.insert({A[i],i});

    for(int i=k; i<=n; i++)
    {
        cout<<S.find_by_order(L-1)->first<<" ";
        S.erase(S.find({A[i-k],i-k}));
        S.insert({A[i],i});
    }
}

score 1 · Answer 2 · answered Nov 25 '19 at 18:51

Here's a more portable solution using only C++ multisets. The idea is to maintain two multisets, the first will contain the smallest L-1 numbers while the other will contain the rest. The complexity is O(N log K). The first K insertions and the initial balancing work in O(K log K). Every subsequent balancing operation will take O(log K) time because the imbalance of the sets will be at most 1 by absolute value.

#include <iostream>
#include <vector>
#include <set>
using namespace std;

int main() {
    int n, k, l;
    cin >> n >> k >> l;
    vector<int> a(n);
    for (int& x : a)
        cin >> x;
    multiset<int> small, large;

    auto balance = [&]() {
        while ((int)small.size() >= l) {
            large.insert(*--small.end());
            small.erase(--small.end());
        }

        while ((int)small.size() < l-1) {
            small.insert(*large.begin());
            large.erase(large.begin());
        }
    };

    auto add = [&](int x) {
        small.insert(x);
    };

    auto rem = [&](int x) {
        auto it = small.find(x);
        if (it != small.end())
            small.erase(it);
        else {
            it = large.find(x);
            large.erase(it);
        }
    };

    for (int i=0; i<k; i++)
        small.insert(a[i]);
    balance();
    for (int i=k; i<=n; i++) {
        cout << *large.begin() << '\n';
        if (i < n) {
            add(a[i]);
            rem(a[i-k]);
            balance();
        }
    }
}

Bob Jarvis - Слава Україні · Answer 3 · 2019-11-26T13:01:59.790

One way way to do this is to:

Break the input collection into a sequence of sub-collections of the desired length.
Extract the unique elements from each sub-collection
Sort the collection of unique elements
Take the third element from the sorted collection

In Clojure (Lisp variant) we could do this as:

(defn nth-least-by-l [c n l]
  (map #(list (nth (sort (set %)) (dec n)) %) (partition l 1 c)))

Invoking this for your test collection:

(nth-least-by-l [3 2 8 4 7 6 1] 3 5)

returns

((4 (3 2 8 4 7)) (6 (2 8 4 7 6)) (6 (8 4 7 6 1)))

which is a bit hard to read. Using the following to pretty-print it

(do
  (println "Third minimums:")
  (doseq [[nval col] (nth-least-by-l [3 2 8 4 7 6 1] 3 5)]
    (println col " => " nval)))

produces the following output:

(3 2 8 4 7)  =>  4
(2 8 4 7 6)  =>  6
(8 4 7 6 1)  =>  6

which is correct.

An algorithm to find L th min/max from length K subarray

3 Answers3