1

I have a DataFrame containing a time series like below:

time             data                               
00:00:02.338000  1
00:00:02.377000  12
00:00:02.534000  43
00:00:02.628000  23
00:00:02.650000  9.8
00:00:02.654000  11
00:00:02.719000  6
00:00:02.726000  7
00:00:02.737000  123
00:00:02.746000  231
00:00:02.801000  412
00:00:03.010000  123

given a time interval, I want to return a time series of time that contains the last available timestamp after the given time interval. For example, for a time interval of 100ms, it should return:

time                                            
00:00:02.377000  
00:00:02.377000  
00:00:02.628000  
00:00:02.726000  
00:00:02.746000  
00:00:02.746000  
...

For a large dataset, using for loop is not viable. Is there any efficient way to achieve this?

Nathan
  • 174
  • 2
  • 7

1 Answers1

1

If the dataset is an ordered list, use a binary search for the first dataset and a second search on the rest of the data for the last entry. The search of course might give you the wrong answer if the value you are looking for is not in the list. The search should give you the closest position to what you need. The element you are looking for might then be the element returned, the element before or after the returned one.

An example for a binary search can be found here: Binary search in a Python list

rbf
  • 541
  • 4
  • 16
  • `numpy` also has binary search: [`numpy.searchsorted`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html). – Graipher Feb 02 '18 at 09:56
  • 1
    @Graipher Good to know. Haven't worked with numpy yet. Thank you for the hint – rbf Feb 02 '18 at 10:34