1

I have a long-running (several hours) script that periodically sends queries to a server. The server is very sensitive to load, so the queries are sparse (not more than 1 every 3 minutes).

The server will always take exactly 10 minutes to process the query. So I can check the result of query 1 any time after 10 minutes of sending it.

So there are two types of operations, "sending query" and "checking result of query". I want all operations to happen at random intervals (subject to the constraint than there are at least 3 minutes between adjacent operations)

Following the advice in this answer (https://stackoverflow.com/a/51918697/10690958) , I can generate a time-series of integers such that there is a gap of at least 3 between them. Lets all be series 1.

I can also generate a similar time-series of status checking queries (3 minutes between them). Lets call this series 2.

Now series 1 is randomly spaced. Series 2 is also randomly spaced. But there is a correlation between series 1 and 2 ,i.e. "response time"="query time"+10 minutes.

This the union of series 1 and 2 wont be random. Furthermore there is a (very small) possiblility of collision. For example, query 2 might be going out exactly when one is checking the result of query 1.

Is there a way to make union of the two sequences also perfectly random , as well as avoid the possibility of collisions. Ideally all traffic to the server (whether query or status check) should be at perfectly random intervals.

I realize that the title is not very descriptive, but could not figure out a better way to describe the situation. Please edit if you think you have a better description.

For example:

query_sequence=set([3,8,12,21,37])
check_result_sequence=set([13,18,22,31,47])
server_traffic=query_sequence.union(check_result_sequence)

But their union (server_traffic) is not random , since

check_result_sequence=query_sequence+10

P.S.: Generating time-points with more granularity might help with reducing probability of collisions (as mentioned in the comment). As regards randomness of the union of two sequences, I dont see any satisfactory solution. What I finally decided to do was

check_result_sequence=query_sequence+10+( 5*random.random())

This adds a random "jitter" to the responses sequence, and so should help with reducing correlation between the two sequences.

grill05
  • 171
  • 14
  • 2
    Can you give an example of what result you expect? It is not yet very clear to me. – Romain Reboulleau Dec 07 '18 at 12:41
  • the part which is not clear to me, can if I have send a query request 1 and I am only going to get output after 10 mins, can I send query 2 request after 3 minutes? – Sach Dec 07 '18 at 13:34
  • Yes. The queries can be sent at any time (subject to a minimum distance of 3 minutes between them). The responses will be available after 10 minutes (i.e. the server takes exactly 10 minutes to process each query). The stipulation of a minimum distance is needed to reduce server load. Querying the server contributes to load (server uses cpu/gpu/memory to process queries). Checking the status does not contribute to server load (since processing is already done). Note that these are "sparse" sequences , i.e. most of the time the script is sleeping between sending traffic to the server. – grill05 Dec 07 '18 at 13:46
  • 1
    Your example includes a granularity of minutes. You could instead generate intervals of seconds or milliseconds or even more granular units, to reduce the risk of collisions. – Peter O. Dec 08 '18 at 06:45

1 Answers1

0

1) I hardly see the necessity to randomize the interval between the requests

2) You could do a single list: a list which represent the available moments to submit a request

server_traffic=set([3,8,12,15,19,23,26,30,34,40])
for x in range(4):
    send_query(server_traffic)
while(True):
    send_result_request(server_traffic)
    send_query(server_traffic)

Then every time you decide if you want to send a query, or to check the result, with your own policy. This should make everything easier

Federico Dorato
  • 710
  • 9
  • 27
  • This would work if I did not want the traffic to the server to be random. I've only mentioned the relevant details here, so the necessity of randomness might not be apparent. But as the title says, (pseudo-)randomness is required. – grill05 Dec 07 '18 at 13:09
  • @grill05 Why do you say this would not work? The set is created randomly (I used the same writing-style you used) and afterwards you decide if to send a query or a result_request – Federico Dorato Dec 07 '18 at 13:11
  • perhaps I misunderstood, but note that if query 1 is sent at time 3 minutes, response 1 can only be checked after 13 minutes. So , suppose if one were to decide whether to send_query/check_result at a particular point in the sequence depending on a coin toss, (say at point 8 or 12) it might not work. To clarify can you write some code/pseudocode about how "you decide if to send a query or a result_request ". thanks – grill05 Dec 07 '18 at 13:21
  • @grill05 check the new edit ;) If you send 4 query and after that you ask one result and one query every time, you cannot go under 10 minutes! If you are wondering how to preserve the order of the queries and of the requests, I suggest you to use a queue – Federico Dorato Dec 07 '18 at 13:28