2

What's a memory-efficient way to compute 10 requests per second based on timestamps of the incoming data for the last 60 seconds?

I have the following 10-digit, Unix timestamps:

1458573970
1458573970
1458573970
1458573971
1458573972
1458573971
1458573973
1458573975
1458573980

We have about 9 requests in a timespan of 10 seconds. I have to keep lag in consideration, as some of the incoming timestamps can be out of order by plus/minus a second.

Eventually there will be a cutoff of 60 seconds, so I want to keep track of the 60 second cutoff for every 10 requests per seconds. (So I need to determine if I continuously get 10 requests per second on average for the last 60 seconds.)

I saw the answer to this question Calculating number of messages per second in a rolling window? but the answer seems to be based off immediate data, and most of the answers don't take a historical timestamp into account.

I thought about doing something like this, but I don't quite have a solution formed.

AAA
  • 1,962
  • 4
  • 30
  • 55
  • Can you clarify what you mean by average number of requests per second? You have 9 requests in 95 seconds, so the average number of requests per second would be about `0.1`. – Nico Schertler Nov 27 '19 at 17:58
  • Sorry, there were some typos. I've updated my question. – AAA Nov 27 '19 at 18:05
  • So you want to have a window of 60 seconds and calculate the request rate for this window, sliding through your entire data set? Or do you know when the 60 seconds cutoff happens and want discrete windows instead? – Nico Schertler Nov 27 '19 at 18:13
  • Yes, there is a 60 second cutoff, but the dataset I provided is just an example. We could have thousands or millions of timestamps in mostly chronological order. – AAA Nov 27 '19 at 18:20
  • My question was if you know at what time the cutoff happens. Or if you rather want a sliding window. – Nico Schertler Nov 27 '19 at 18:27
  • I need a sliding window. I don't know when the cutoff happens as I iterate through the data. – AAA Nov 27 '19 at 18:28

2 Answers2

1

You can use the EMA filter. With this approach, you need just use 2 cells of memory, independent on data frequency and window size - rate_accumulator, and last_time_event.

See the following demo/test:

#!/usr/bin/perl

my @data = qw(1558573970 1558573970 1558573970 1558573971 1558573972 1558573971 1558573973 1558573975 1558573980);
my $tlast = 0;
my $rate  = 0;

for(my $t = 1; $t < 100000; $t += 6) { # sim mode
#foreach my $t(@data) { # real data
  my $dt = $t - $tlast;
  if($dt > 0) {
    $rate *= exp(-$dt / 60.0);
  }
  $rate++;
  $tlast = $t;
}

$rate -= 0.5; # Maybe, is not need
print "rate=$rate\n";

Such system, where is exp() computing is replaced to binary shifts (for performance sake), is used in the DNS Amplifying Protection subsystem of Emercoin node.

olegarch
  • 3,670
  • 1
  • 20
  • 19
  • Can you explain why we iterate from `1` to `100000`? – AAA Nov 27 '19 at 20:36
  • This is just generate a fictive time series, for simulation purpose. You can modify thase params, to control length of time series, or change value in "t+=6". with current params, it sends 1 tick per 6 virtual seconds, i.e. frequency 10 ticks per minute. You can play with these params to test algorithm on other data series. Or, comment "for", uncomment "foreach", and test on your real data. – olegarch Nov 27 '19 at 20:53
  • Thanks. How would I check the requests per second for the last 60 seconds? Is it the return value? – AAA Nov 29 '19 at 00:48
  • If I input the same timestamp for this algorithm and run over a loop 70 times, I get the average rate of `70`. But this doesn't take into account the last 60 seconds of the average requests per second if I wanted a limit of 10 requests per second... – AAA Nov 29 '19 at 03:03
  • Algorithm computes rate per sliding window. Window size = 60s, see the line "-$dt / 60.0". If you would like change window size - change "60" to another value. Also, if you would translate rate per minute into rate per second, youcan divide it to 60, or another actual window size. – olegarch Nov 29 '19 at 18:03
  • Divide "it" to 60 -- divide what by 60? It's already divided by 60 in `exp(-$dt / 60.0)`. Also is `$dt` the difference between the input timestamp and the last timestamp? It's hard to read the code. I am not a Perl user. – AAA Nov 29 '19 at 18:22
  • Yes, $dt (delta times) is difference between the input timestamp and the last timestamp. Regaring "no perl user" - you can read math article in Wikipedia about EMA, link provided in the 1st paragraph of the answer. Or, search other math explanations or code example. Also, in the original answer, I provided you info about high-performance application, available as open source. There is uses same algorithm, computes threshold ration for block abusers with high per-EMA-window ratio. This s C, not a Perl: https://github.com/emercoin/emercoin/blob/master/src/emcdns.cpp#L1026 – olegarch Nov 29 '19 at 18:40
0

A simple sliding window approach could look as follows:

const windowWidth = 60
int upper = 0 // index of the upper window boundary (exclusive)
for lower from 0 to length of data - 1
    // push the upper bound to include 60 seconds
    while timestamps[upper] - timestamps[lower] < windowWidth
        ++upper
    if upper < length of data - 1
        // Add special handling for the case timestamps[upper - 1] - timestamps[lower] == 0 if needed
        requestRate = (upper - lower) / (timestamps[upper - 1] - timestamps[lower])
        report avg or check for validity      
Nico Schertler
  • 32,049
  • 4
  • 39
  • 70