1

So given an array and a window size, I need to find the second largest in every window. Brute force solution is pretty simple, but I want to find an efficient solution using dynamic programming

The brute force solution times out when I try it for big arrays, so I need to find a better solution. My solution was to find the second greatest in each sliding window by sorting them and getting the second element, I understand that some data structures can sort faster, but I would like to know if there are better ways.

  • 1
    I don't think dynamic programming is the answer. A specialized [min-heap](https://en.wikipedia.org/wiki/Binary_heap) should do. – user3386109 Jun 23 '19 at 01:55
  • 1
    (@user3386109: suggest a heap to specialise supporting *remove by key*.) – greybeard Jun 23 '19 at 02:08
  • Don't know why I said min-heap. That should be max-heap. And the specializations are 1) remove by key (as mentioned by @greybeard) and 2) peek second largest. Note that the root is the largest element in the heap. It's either the left child of the root, or right child of the root, that's the second largest. – user3386109 Jun 23 '19 at 02:21
  • @user3386109 I thought dynamic programming could help as in a sliding window, the existing second greatest element could still be the second greatest if it or the first greatest wasn't removed. Will definitely check on using min and max heap too. Thank you – Sundar Ganapathy Jun 23 '19 at 03:18
  • @user3386109 I think there may be an O(n) solution here, no? – גלעד ברקן Jun 23 '19 at 04:27
  • @גלעדברקן I don't think an O(n) solution exists. I did look at your answer, but did not understand it. You seem to be implying that the problem can be solved while only keeping two items in a queue. While that may be true for a window size of 3, I don't see how that works for a larger window size, e.g. 100. – user3386109 Jun 23 '19 at 05:16
  • (@user3386109: suggesting *support for remove by key* was meddlesome to an extent, as you can simply "keep a tab on the elements" instead of looking them up. Then again, the most simple solution would seem to use an ordered set as [suggested by mahbubcseju](https://stackoverflow.com/a/56720590/3789665).) – greybeard Jun 23 '19 at 05:44
  • @greybeard Agree that an ordered set is the simple solution for languages that provide it. – user3386109 Jun 23 '19 at 06:05
  • @user3386109 no, the queue is only limited by window size. The "pop" and "remove from front" are `while queue_back <= A[i]` and `while queue_front is outside next window` respectively (the complication of only one smaller element left in queue notwithstanding). Care to offer a counter example of window size 4? – גלעד ברקן Jun 23 '19 at 07:01
  • @גלעדברקן But in that case, don't "pop" and "remove from front" have complexity O(W) where W is the window size? – user3386109 Jun 23 '19 at 07:04
  • @user3386109 a double ended queue has complexity O(1) to remove from either front or back. We insert in the back only. – גלעד ברקן Jun 23 '19 at 07:05
  • @גלעדברקן Yes, but I see the word `while`. Not sure what that means, if not a loop. Which is to say that I still don't have a complete picture of how your algorithm is intended to work. I'm especially confused by the statement, *"The front of the queue will have larger (**AND** earlier seen) elements:"* As far as I'm concerned that **AND** needs to be an **OR**. The queue is either ordered by the element arrival, or by the element value, not both. The larger elements are not necessarily the earliest, and conversely, the earliest are not necessarily the largest. – user3386109 Jun 23 '19 at 07:31
  • @user3386109 the order of the queue is uniquely determined only by our removal from front or back, and the insertion. Elements inserted are necessarily earlier than the next element inserted. But they are inserted after all smaller or equal elements in the back are removed, and all elements outside this window have been removed from the front. – גלעד ברקן Jun 23 '19 at 07:35
  • @user3386109 it may be easier for both of us to understand each other via a counter example with window size 4. – גלעד ברקן Jun 23 '19 at 07:38
  • @גלעדברקן I am unable to provide a counter-example, because I haven't fully understood the details of your algorithm. I need to see the exact set of rules that govern the queue. But I'm afraid it's getting late here, so I have to postpone further discussion until later. Cheers! – user3386109 Jun 23 '19 at 07:46
  • @user3386109 sounds good. I think the rules are pretty clear but I'll be happy to clarify anything further. – גלעד ברקן Jun 23 '19 at 07:48

4 Answers4

3

There are many ways that you can solve this problem. Here are a couple of options. In what follows, I'm going to let n denote the number of elements in the input array and w be the window size.

Option 1: A simple, O(n log w)-time algorithm

One option would be to maintain a balanced binary search tree containing all the elements in the current window, including duplicates. Inserting something into this BST would take time O(log w) because there are only w total elements in the window, and removing an element would also take time O(log w) for the same reason. This means that sliding the window over by one position takes time O(log w).

To find the second-largest element in the window, you'd just need to apply a standard algorithm for finding the second-largest element in a BST, which takes time O(log w) in a BST with w elements.

The advantage of this approach is that in most programming languages, it'll be fairly simple to code this one up. It also leverages a bunch of well-known standard techniques. The disadvantage is that the runtime isn't optimal, and we can improve upon it.

Option 2: An O(n) prefix/suffix algorithm

Here's a linear-time solution that's relatively straightforward to implement. At a high level, the solution works by splitting the array into a series of blocks, each of which has size w. For example, consider the following array:

31  41  59  26  53  58  97  93  23  84  62  64  33  83  27  95  02  88  41  97

Imagine that w = 5. We'll split the array into blocks of size 5, as shown here:

31  41  59  26  53 | 58  97  93  23  84 | 62  64  33  83  27 | 95  02  88  41  97

Now, imagine placing a window of length 5 somewhere in this array, as shown here:

31  41  59  26  53 | 58  97  93  23  84 | 62  64  33  83  27 | 95  02  88  41  97
                             |-----------------|

Notice that this window will always consist of a suffix of one block followed by a prefix of another. This is nice, because it allows us to solve a slightly simpler problem. Imagine that, somehow, we can efficiently determine the two largest values in any prefix or suffix of any block. Then we could find the second-max value in any window as follows:

  • Figure out which blocks' prefix and suffix the window corresponds to.
  • Get the top two elements from each of those prefixes and suffixes (or just the top one element, if the window is sufficiently small).
  • Of those (up to) four values, determine which is the second-largest and return it.

With a little bit of preprocessing, we can indeed set up our windows to answer queries of the form "what are the two largest elements in each suffix?" and "what are the two largest elements in each prefix?" You can kinda sorta think of this as a dynamic programming problem, set up as follows:

  • For any prefix/suffix of length one, store the single value in that prefix/suffix.
  • For any prefix/suffix of length two, the top two values are the two elements themselves.
  • For any longer prefix or suffix, that prefix or suffix can be formed by extending a smaller prefix or suffix by a single element. To determine the top two elements of that longer prefix/suffix, compare the element used to extend the range to the top two elements and select the top two out of that range.

Notice that filling in each prefix/suffix's top two values takes time O(1). This means that we can fill in any window in time O(w), since there are w entries to fill in. Moreover, since there are O(n / w) total windows, the total time required to fill in these entries is O(n), so our overall algorithm runs in time O(n).

As for space usage: if you eagerly compute all prefix/suffix values throughout the entire array, you'll need to use space O(n) to hold everything. However, since at any point in time we only care about two windows, you could alternatively only compute the prefixes/suffixes when you need them. That will require only space O(w), which is really, really good!

Option 3: An O(n)-time solution using clever data structures

This last approach turns out to be totally equivalent to the above approach, but frames it differently.

It's possible to build a queue that allows for constant-time querying of its maximum element. The idea behind this queue - beginning with a stack that supports efficient find-max and then using it in the two-stack queue construction - can easily be generalized to build a queue that gives constant-time access to the second-largest element. To do so, you'd just adapt the stack construction to store the top two elements at each point in time, not just the largest element.

If you have a queue like this, the algorithm for finding the second-max value in any window is pretty quick: load the queue up with the first w elements, then repeatedly dequeue an element (shift something out of the window) and enqueue the next element (shift something into the window). Each of these operations takes amortized O(1) time to complete, so this takes time O(n) overall.

Fun fact - if you look at what this queue implementation actually does in this particular use case, you'll find that it's completely equivalent to the above strategy. One stack corresponds to suffixes of the previous block and the other to prefixes of the next block.

This last strategy is my personal favorite, but admittedly that's just my own data structures bias.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • I don't see why we need a special get_max operation (described in the links you shared) when we can just pop smaller elements from a simple queue as I described (since those are useless anyway). – גלעד ברקן Jun 23 '19 at 19:12
  • @גלעדברקן The question is how you would represent the queue such that you can pop out smaller elements. I read over your solution and I unfortunately couldn't understand the rationale about how you decide where you choose to insert or remove elements from the queue. Could you elaborate on that? – templatetypedef Jun 23 '19 at 19:15
  • Every element is inserted into the queue, but only after all smaller or equal elements are removed from the back, and all elements outside this window have been removed from the front. (Since we're looking for the second largest, we need to maintain an occasionally filled temporary variable that was once a smaller single element in the queue, which is compared against the current largest two in the queue and discarded appropriately.) (Since the queue stores indexes, it's easy to tell if the front is outside the window.) – גלעד ברקן Jun 23 '19 at 19:22
1

So just take a data structure as like as set which stores the data orderly. like if you store 4 2 6 on the set it will store as 2 4 6.

So what will be the algorithm:

Let,

Array = [12,8,10,11,4,5] window size =4

first window= [12,8,10,11] set =[8,10,11,12]

How to get the second highest:
- Remove the last element from the set and store in a container. set=[8,10,11],contaniner = 12
- After removing, current last element of the set is the second largest of the current window.
- Again put the removed element stored in the container to the set,set=[8,10,11,12]
Now shift your window, - delete 12 from the set and add 4.
- Now you will get the new window and set.
- check like the similar process.
Complexity of removing and adding element in a set is about log(n).

One tricks:

If you always want to store the data in decreasing order, then you can store the data by multiplying it by -1. And when you pop up the data, use it by multiplying it by -1.

mahbubcseju
  • 2,200
  • 2
  • 16
  • 21
  • My brute force solution was almost the same, except I sorted the numbers instead of using a set. The advantage of set being lower complexity. I would like to know if there are other ways to tackle this problem. Thank you – Sundar Ganapathy Jun 23 '19 at 03:16
  • There are also some other ways ! Like segment tree data structure Complexity O(logn). And please upvote my answer if it is helpful. Thanks – mahbubcseju Jun 23 '19 at 03:36
1

We can use a double ended queue for an O(n) solution. The front of the queue will have larger (and earlier seen) elements:

  0  1  2  3  4  5
{12, 8,10,11, 4, 5}
window size: 3

i   queue (stores indexes)
-   -----
0   0
1   1,0
2   2,0 (pop 1, then insert 2)
output 10
remove 0 (remove indexes not in
   the next window from the front of
   the queue.)
3   3 (special case: there's only one
   smaller element in queue, which we
   need so keep 2 as a temporary variable.)
output 10
4   4,3
output 10
remove 2 from temporary storage
5   5,3 (pop 4, insert 5)
output 5

The "pop" and "remove from front" are while A[queue_back] <= A[i] and while queue_front is outside next window respectively (the complication of only one smaller element left represented in the queue notwithstanding). We output the array element indexed by the second element from the front of the queue (although our front may have a special temporary friend that was once in the front, too; the special friend is dumped as soon as it represents an element that's either outside of the window or smaller than the element indexed by the second queue element from the front). A double ended queue has complexity O(1) to remove from either front or back. We insert in the back only.

Per templatetypedef's request in the comments: "how you determine which queue operations to use?" At every iteration, with index i, before inserting it into the queue, we (1) pop every element from the back of the queue that represents an element in the array smaller than or equal to A[i], and (2) remove every element from the front of the queue that is an index outside the current window. (If during (1), we are left with only one smaller or equal element, we save it as a temporary variable since it is the current second largest.)

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • I second [user3386109](https://stackoverflow.com/questions/56720446/finding-second-largest-element-in-sliding-window#comment100003197_56720446): I don't understand your answer. Calling the size of window&queue *k*, is n*k* a tight upper bound for the time taken by your solution? If not: how, after removing the element that has ben output for a while, do you pick the next element to output? – greybeard Jun 23 '19 at 05:36
  • @greybeard we output the second element from the front (the right side). – גלעד ברקן Jun 23 '19 at 06:57
  • From your exchange with user3386109: please put up a test frame and test cases with a window size >4, code your approach and report results. A sketch of the procedure or program code would be a boon, too. – greybeard Jun 23 '19 at 07:43
  • @greybeard I would prefer to just explain any simple counter example. Why greater than and not just equal to 4? – גלעד ברקן Jun 23 '19 at 07:47
  • Can you elaborate on how you decide which value to remove from the queue at each point? For example, let's imagine that my window has size 3 and I have the array `[10, 12, 9, 5]`. How do I kick the 10 out of the queue when it gets shifted out of the window? – templatetypedef Jun 23 '19 at 19:28
  • @templatetypedef `input: [10, 12, 9, 5]; (i,queue,temp): (0,[0],null) > (1,[1],0) > (2,[2,1],0) > output 10 > (3, [3,2,1],null) > output 9` – גלעד ברקן Jun 23 '19 at 19:35
  • Can you edit this answer to include a description of how you determine which queue operations to use? I don't quite get how you arrived at these queue operations. (My apologies - it's pretty likely that I'm missing something fairly obvious here.) – templatetypedef Jun 23 '19 at 19:37
  • @templatetypedef done. It may need tweaking but I hope it moves us forward :) – גלעד ברקן Jun 23 '19 at 19:46
  • Ohhhhh, that makes a lot of sense. Sorry - I didn’t pick up on what you were doing here. That idea seems to work (or be pretty close to working modulo handling the temporary value). Thanks for clarifying! – templatetypedef Jun 23 '19 at 19:56
0

There is a relatively simple dunamic programming O(n^2) solution: Build the classic pyramid structure for aggregate value over a subset (the one where you combine the values from the pairs below to make each step above), where you track the largest 2 values (and their position), then simply keep the largest 2 values of the 4 combined values (which is less in practise due to overlap, use the position to ensure they are actually different). You then just read off the second largest value from the layer with the correct sliding window size.

Ninetails
  • 254
  • 1
  • 5