2

Consider the following javascript data structure:

let sensors = { 
  sensor1: {
    min: 1.00,
    max: 9.00,
    data: [
      {
        timestamp: 1517760374400,
        value: 1.00
      },
      {
        timestamp: 1517760374500,
        value: 2.00
      },
      {
        timestamp: 1517760374600,
        value: 9.00
      },
      {
        timestamp: 1517760374700,
        value: 1.00
      },
      {
        timestamp: 1517760374800,
        value: 3.00
      },
      {
        timestamp: 1517760374900,
        value: 1.00
      },
      {
        timestamp: 1517760375000,
        value: 9.00
      },
      {
        timestamp: 1517760375100,
        value: 8.00
      },
    ]
  },
  // sensor2, sensor3, etc...
}

Imagine there could be thousands of timestamped data for each sensor.

Initially you can easily set a min / max value, every time an object is added by checking if it is bigger or smaller than the current max

But the tricky part and my question is this:

What is the size of the array is limited - in this case we would set it to a length of 8.

Whenever a new item after item 8 is added (the limit is reached), the 1st item will be removed, and the nth item will be pushed into the end of the array.

The problem is that there can be more items with the same value, and even if there isn't we have no way of knowing which min / max is next without iterating the entire array once again

This is supposed to be scalable to thousands of array items, and is to be run roughly every second with ideally - as low cpu utilization as possible - I don't really think looping over thousands of items every second will be effecient enough.

Do you see another other ways of keeping track of min / max values of an array which is changing like this ever second?

Dac0d3r
  • 2,176
  • 6
  • 40
  • 76
  • Perhaps you need something like this: https://stackoverflow.com/questions/12190184/can-min-max-of-moving-window-achieve-in-on/12195098#12195098 – MBo Feb 04 '18 at 17:13
  • the min or max have to be re-calculated only if the removed item contains the min or max, and only if its needed (meaning it can be kept as `undefined` until the value is needed) – Slai Feb 04 '18 at 17:22

2 Answers2

2

Data structure:

  • Queue size of N to store N item.

  • Min / Max Heap to track the min / max item.

  • A hash map to track the frequency of each item.

When you there is a coming data, update the frequency of the new item, if not there in the heap, add it.

When you need to pop an item, decrease the frequency, while frequency of head == 0, remove from the heap.

Head of the heap is the solution.

Pseudo code:

const swap = (data, i, j) => {
  let temp = data[i];
  data[i] = data[j];
  data[j] = temp;
}

class Heap {
  constructor() {
    this.data = [];
    this.inHeap = {};
    this.size = 0;
  }
  
  head() {
    return this.data[0];
  }
  // add item O(logN);
  add(number) {
  
    if (!this.inHeap[number]) {
      this.data[this.size++] = number;
      let current = this.size - 1;

      while (current > 0) {
        if (this.data[current >> 1] < this.data[current]) {
          swap(this.data, current >> 1, current);
          current >>= 1;
        } else {
          break;
        }
      }
      this.inHeap[number] = true;
    }
    
  }
  // remove head O(logN);
  remove() {
    this.size--;
    delete this.inHeap[this.data[0]];
    this.data[0] = this.data[this.size];

    let current = 0;
    while (current * 2 + 1 < this.size) {
      let next = current * 2 + 1;
      if (current * 2 + 2 < this.size && this.data[current * 2 + 2] > this.data[current * 2 + 1]) {
        next = current * 2 + 2;
      } 
      
      if (this.data[current] < this.data[next]) {
        swap(this.data, current, next);
        current = next;
      } else {
        break;
      }
    }
    
  }
}

class Queue {
  constructor(maxSize) {
    this.maxSize = maxSize;
    this.size = 0;
    this.data = [];
    this.head = -1;
  }
  
  // add a number and return the removed item if any
  add(number) {
    let next = (this.head + 1) % this.maxSize;
    let removedItem = this.data[next];
    this.data[next] = number;
    this.head = next;
    
    if (removedItem === undefined) {
      this.size++;
    }
    
    return removedItem;
  }
  
  get(i) {
    return this.data[(this.head - this.size + 1 + i + this.maxSize ) % this.maxSize];
  }
}

class Solution {
  constructor(n) {
    this.n = n;
    this.queue = new Queue(this.n);
    this.heap = new Heap();
    this.frequency = {};
  }
  add(number) {
    let removedItem = this.queue.add(number);
    
    if (!this.frequency[number]) {
      this.frequency[number] = 1;
      this.heap.add(number);
    } else {
      this.frequency[number]++;
    }
    
    if (removedItem !== undefined) {
      this.frequency[removedItem]--;
      
      if (!this.frequency[removedItem]) {
        delete this.frequency[removedItem];
      }
      
      // remove head if frequency is zero
      while (!this.frequency[this.heap.head()]) {
        this.heap.remove();
      }
    }
  }
  
  size() {
    return this.queue.size;
  }
  
  get(i) {
    return this.queue.get(i);
  }
  
  max() {
    return this.heap.head();
  }
}

/*** use of solution here!! **/
let solution = new Solution(3);
let numberInput = document.getElementById("number");
let data = document.getElementById("data");
let maxResult = document.getElementById("max");
let heapData = document.getElementById("heap");
let queueData = document.getElementById("queue");
let frequencyData = document.getElementById("frequency");

function addNumber() {
  let value = parseInt(numberInput.value);
  
  if (isNaN(value)) {
    alert("Please input a number!");
  } else {
    solution.add(value);
  }
  
  maxResult.innerHTML = "Max: " + solution.max();
  
  // gather data
  let dataString = "";
  for (let i = 0; i < solution.size(); i++) {
    dataString += " " + solution.get(i);
  }
  
  data.innerHTML = "Data: " + dataString;
  heapData.innerHTML = "Heap: " + JSON.stringify(solution.heap.data.slice(0, solution.heap.size));
  queueData.innerHTML = "Queue: " + JSON.stringify(solution.queue);
  frequencyData.innerHTML = "Frequency: " + JSON.stringify(solution.frequency);
  
  numberInput.value = parseInt(Math.random() * 1000);
}
.input {
  display: flex;
}

.input input {
  width: 200px;
  padding: 5px 10px;
  outline: none;
}

.input button {
  padding: 5px 10px;
  border: 1px solid light gray;
}

div {
  padding: 5px 10px;
}
<div class="input">
  <input type="text" id="number" />
  <button onClick="addNumber()">Add</button>
</div>
<div class="result">
  <div class="data" id="data">
    Data: 
  </div>
  <div class="max" id="max">
    Max: undefined!
  </div>
</div>
<div class="debug">
  <div>
    <code class="data" id="heap">
      Heap:
    </code>
  </div>
  <div>
    <code class="max" id="queue">
    Queue:
    </code>
  </div>
  <div>
    <code class="max" id="frequency">
      Frequency:
    </code>
  </div>
</div>
Daniel Tran
  • 6,083
  • 12
  • 25
  • Hi Daniel, I think you're right, but is there any chance you could give me a little more help - I'm not quite able to connect the dots and see how it applies to my dataset. "Queue" = the "data" array? "Min / Max Heap" what do you mean by that - there could me multiple items (different timestamps) with the same min / max value. Really want to understand your solution, since I get the feeling this is the way to go. Any chance of you writing some quick pseudo code of how your proposed structure would look like? – Dac0d3r Feb 04 '18 at 16:50
  • I think the head of the heap is only the solution when the heap is kept sorted, right? Or am I missing something – c-wilson Feb 04 '18 at 17:04
  • I dunno. Very curious to see this solution in action though. But the items are kinda sorted in the sense that they're pushed at the end of the array each second, so the timestamps should be in order. – Dac0d3r Feb 04 '18 at 17:09
  • Daniel, please see the code I added to the original question at bottom. Any further help you can give would be greatly appreciated :) – Dac0d3r Feb 04 '18 at 18:54
  • Hello, I've updated a simple solution in the answer so that you can figure it out. – Daniel Tran Feb 05 '18 at 04:02
  • Cool, I got it working last night using collections.js Heap, but this is very helpful and thanks alot for all your help! – Dac0d3r Feb 05 '18 at 09:04
  • Once an item count reaches 0 it's safe to remove it from both the Frequency and Heap, right? Also what would it take to implement a min value in addition to the max value? If we cleaned up the heap and Frequency, we would just take the max value from the heap like this (in the min function under "Solution": `Math.min(...this.heap.data)` - or might there a better way? :) – Dac0d3r Feb 05 '18 at 09:46
  • For a Min/Max Heap, removing the item from the head cause O(logN), if we find and remove an item, then it would be O(N), again, very big. So we wait until that number become the head, then we remove. If there is no repeated item, Math.min(...this.heap.data) will be the same as Math.min(this.queue.data), again, it's O(N) every second that we want to avoid from the beginning. To have a min value, add a Min Heap and do the same way. – Daniel Tran Feb 05 '18 at 09:50
  • Okay Daniel, appreciate it, but as your code is right now, it looks as if the heap and Frequency will keep growing even when count is 0. I guess this could become a problem regarding performance, when running for a long time?. Last question - I seem to have problems with the tab crashing when adding like 20 or 30 items. It crashes both here in the snippet and in my local html file which I extracted it to. Have you got any idea, what goes wrong here :D I'm trying to find the error, but maybe you could beat me to it, since you got a better understanding of the code :) – Dac0d3r Feb 05 '18 at 09:59
  • 1
    There is a bug in my heap implementation, I forget to add `else { break; }` – Daniel Tran Feb 05 '18 at 10:08
  • Works much better now :D When you say "add a Min Heap" do you mean I have to create a class MinHeap, like the class Heap (will become MaxHeap), with a different implementation of add / remove? – Dac0d3r Feb 05 '18 at 10:17
  • 1
    No need, you can customize the Heap with a compare function as a parameter. – Daniel Tran Feb 05 '18 at 10:19
  • Awesome, got it working now with a min/max and compare function as parameter to Solution --> Heap. What a great solution! – Dac0d3r Feb 05 '18 at 11:34
  • This solution works great so far, however I'm a bit worried about the growing heap, since the app i'm using it for is updating every second and supposed to do so 24/7. In the Heap's remove function it deletes from the inheap set, but it doesn't look to me as if the heap data array is every deleted from. Is that intentional and unavoidable? – Dac0d3r Feb 15 '18 at 15:09
0

Sounds fun. I think that you're going to run into a problem where you just don't know a priori whether a value is going to be an extreme (max/min) value in the future.

My thought would be to add an expiration counter to your current max & min values. This counter decrements each time you don't replace your rolling min/max. It is reset to 8 when it is updated with a new, fresh value. Min and max obviously have separate counters.

Now you only have to loop through your values if a counter decrements to 0 and your min/max value becomes stale (it is no longer in your list). If, for example, your min counter expires, you now have to identify the remaining minimum that is currently in your list of 8. In this case, you'll reset the expiration counter to match the number of iterations until that new min value gets removed from the list (8 - it's index).

That might save you some CPU cycles!

c-wilson
  • 395
  • 3
  • 11