0

I am working a app to monitor CNC machines. I use Nest.js + TypeScript + MikroORM + PostgreSQL.

A customer needs to have real-time timeline graph with some state data (6 Boolean values, only one can be true) from start of the first workshift (6am) until the end of the last workshift (10pm) in a particular day.

I get the current state (6 boolean values) every second, then compute intervals for consecutive records with the same state (all 6 boolean values have same values in each state record during a single interval).

Example data:

// State record
{
  created: Date,
  v1: boolean,
  v2: boolean,
  v3: boolean,
  v4: boolean,
  v5: boolean,
  v6: boolean
}

// Interval
{
  startTime: Date,
  endTime: Date,
  length: number
  v1: boolean,
  v2: boolean,
  v3: boolean,
  v4: boolean,
  v5: boolean,
  v6: boolean
}

One interval has usually a length of 10-100 minutes, but it can be several days long (e.g. on holiday company shutdown).

As we monitor many machines (60+ machines), when there are many intervals between 6am and 10pm, the data is too much to hold in memory (it causes JS heap to go out of memory).

I was thinking about publishing the data to the frontend using MQTT with all data from 6am to now in a machine/$id/data/current topic and then only the changes in machine/$id/data topic, however:

  1. the last interval (latest) can change (endTime and length will be incremented by a second every second and when the state changes, the previous interval still is incremented by a second and a new interval is inserted with 0 length), therefore I cannot simply publish the latest interval, but occasionally I would need to publish latest two interval;
  2. bigger problem is that initial (current) data: it will be huge e.g. at 9pm (15 hours in total; of course, it depends on the number of intervals).

With the first problem, the frontend could deal with it comparing the IDs of the intervals, but I have no idea how to solve the issues with the initial (current) data: I store the intervals in the DB, therefore I need to select * from mx.intervals where start_time < now() and end_time > '2022-01-25T06:00:00.00+01' order by start_time asc on every machine on app initialisation, which fails (fatal error: reached heap limit allocation failed - JS heap out of memory).

Would anyone help me how could I implement this? I need an algorithm, not necessarily a working code.


At one other customer with 35 machines, yesterday (24 January 2022) from 6am to 10pm, there are 103 intervals per machine on average. Maximum number of intervals was 567 on a machine, however, it could be higher, as some operations (i.e. producing a piece/product) on some CNC machines take about 30 seconds.

tukusejssirs
  • 564
  • 1
  • 7
  • 29
  • 1
    Try a simple optimization. Firstly, you don't need to return all boolean variables (if only one can be true, it's enough to return its index using a one-byte field), and you also don't need to return the end of the interval - it's enough to know the length of the interval. And secondly, your query returns data not for a particular day, but all historical data (or maybe it is just an mistake). – stdob-- Jan 25 '22 at 15:28
  • Thanks, @stdob--, for trying to help me. The thing is that the app crashes (out of heap fatal error) when I try to read the required data (intervals since 6am) for _all_ machines from the DB. Even if I wanted to optimise the variables, the data is too much for the memory and I can’t _read_ (select) the data in the first place. – tukusejssirs Jan 25 '22 at 18:45
  • As for the _one-byte field_, what do you actually mean? Do you mean to store the data as `decimal(6, 0) and use zeros and ones and predefined number spots/positions (each spot [ones, tens, hundreds, …] is translated to a specific boolean variable)? – tukusejssirs Jan 25 '22 at 18:48
  • _you also don't need to return the end of the interval - it's enough to know the length of the interval_: Do I understand it right that you suggest this because a timestamp requires much more space (in memory and DB) than an int? I store `length` as `INT` in milliseconds with `start_time`, as it potentially/occasionally can represent a period of several days. Now that I think of it, I use `timestamptz(3) as type of `start_time` and `end_time` and it could be wise to use UNIX epoch (in milliseconds as `INT`). Do you agree it would be better? – tukusejssirs Jan 25 '22 at 18:56
  • _And secondly, your query returns data not for a particular day, but all historical data (or maybe it is just an mistake)._ Yeah, that is a mistake. – tukusejssirs Jan 25 '22 at 19:35
  • The original interval object (from the OP) measured using [this function](https://stackoverflow.com/a/11900218/3408342) has size of 32 bytes, and the new interval object (based on the suggestions of @stdob--) measures 24 bytes, therefore the improved object is smaller by 25 %. However, when measured using [`object-sizeof`](https://www.npmjs.com/package/object-sizeof), the reported sizes are 166 bytes (original object) vs 66 (new object), therefore the new object is smaller by 60 %. That said, in timestamp in ms uses more digits than `INT`, therefore I’ll use date in DB and number in JS. – tukusejssirs Jan 26 '22 at 13:22
  • Another (potential) problem I see with using a number (`INT`) for status values (each digit representing a state, e.g. `1000` means the 4th value is `true`, rest is `false`): How can I get the individual values (e.g. if `v1` is `true` or `false`)? I could (1) convert it to string and use `!!string[0]` (but `string` consumes more memory than `number`); or (2) compare individual numbers (e.g. if `number` equals to `100`, then the third value is `true`, rest is `false`). – tukusejssirs Jan 26 '22 at 13:43
  • Even after optimalisation, this does not help much, as there are 60 machines and (based on my measurements) there could be 300-600 intervals per machine between 6am and 10pm, therefore just before 10pm, I would need to use `600 * 60 * 66 = 2376000 bytes + 2.266 GiB` of RAM, because I need to send all data of all machines the frontend would request at time of its loading (which could happen any time of day). – tukusejssirs Jan 28 '22 at 18:29

0 Answers0