0

How do I get a sequence of datetime values out of a sorted list, which all have a difference in time of at least x seconds? The sequence should always start at the first value of the list and include the last value, which had a difference bigger than x seconds to the second last value in the sequence.

Example with x = 1 seconds

Input_Arr = np.array([datetime(1981,1,1,0,0,0,40),
                      datetime(1981,1,1,0,0,0,80),
                      datetime(1981,1,1,0,0,1,20),
                      datetime(1981,1,1,0,0,1,50),
                      datetime(1981,1,1,0,0,2,00),
                      datetime(1981,1,1,0,0,2,70),
                      datetime(1981,1,1,0,0,3,10),
                      datetime(1981,1,1,0,0,3,40),
                      datetime(1981,1,1,0,0,4,20),
                      datetime(1981,1,1,0,0,5,00)])

Output_Arr: np.array([datetime(1981,1,1,0,0,0,40),
                      datetime(1981,1,1,0,0,1,50),
                      datetime(1981,1,1,0,0,2,70),
                      datetime(1981,1,1,0,0,4,20),       

I think a combination of the following two approaches will be the solution, but I can't get the code correct. here here

The following works, but is not very clean and also not fast for bigger arrays.

Output_Arr = list()
time_delta = 1
for id, value in enumerate(Input_Arr):
    if id == 0:
        Output_Arr.append(value)
    else:
        if Output_Arr[-1] + timedelta(seconds=time_delta) < value:
        Output_Arr.append(value)
  • 1
    Oh wait... your last edit really changes the scope of the question... It went from "how do I do it" to how do I do it faster" which is a big difference... I think in this case the problem is, as I state in my answer, that you increase the size of your list every iteration. Start by allocating a big list (you know the biggest possible size of the list) then fill it in you loop instead of expending it. And once your loop has run, you can just truncate the list. – LNiederha Nov 03 '22 at 10:35
  • Yeah that was my mistake sorry. The solution came to my head as soon as i've created the original question. But nevertheless thank you for your solution and the hints for optimization. – Sascha Lüthi Nov 03 '22 at 10:59
  • Your welcome. I added the optim in my answer. Hope this helps. If you still need faster code I suggest you pinpoint the bottleneck in which ever approach you choose and ask a new question on how to speed up or bypass this specific part – LNiederha Nov 03 '22 at 11:11

1 Answers1

1

The most straight forward way it to use a loop and compare times using timedelta.

from datetime import datetime, timedelta
from math import ceil
import numpy as np

input_arr = np.array([datetime(1981,1,1,0,0,0,40),
                      datetime(1981,1,1,0,0,0,80),
                      datetime(1981,1,1,0,0,1,20),
                      datetime(1981,1,1,0,0,1,50),
                      datetime(1981,1,1,0,0,2,00),
                      datetime(1981,1,1,0,0,2,70),
                      datetime(1981,1,1,0,0,3,10),
                      datetime(1981,1,1,0,0,3,40),
                      datetime(1981,1,1,0,0,4,20),
                      datetime(1981,1,1,0,0,5,00)])

# Pre allocate the maximal array size possible
x = timedelta(seconds=1)  # Define the minimal time delta
nb_max_output = ceil((input_arr[-1] - input_arr[0])/x) + 1 # Use ceil because of numerical imprecision
output_arr = np.zeros(nb_max_output, dtype=object)  # Pre allocation
 
# Initialize variables
output_arr[0] = input_arr[0]
next_output = 1
   
for i in range(1, len(input_arr)):

    # Check time difference with the last time in output
    if input_arr[i] - output_arr[next_output-1] >= x:
         # Add to the output list
         output_arr[next_output] = input_arr[i]
         next_output += 1

# Truncate output array
output_arr = output_arr[:next_output] 

# Show result
print(output_arr)

The output is:

>> [datetime.datetime(1981, 1, 1, 0, 0, 0, 40)
    datetime.datetime(1981, 1, 1, 0, 0, 1, 50)
    datetime.datetime(1981, 1, 1, 0, 0, 2, 70)
    datetime.datetime(1981, 1, 1, 0, 0, 4, 20)]

The trick is to use deltatime to compare your dates and pre-allocate your array because dynamical allocation is usually slow. There are probably ways to write that shorter in python and some more optimization but this is the biggest one that comes into mind.

LNiederha
  • 911
  • 4
  • 18