2

Let's say I have a dataframe df with one column called time with a timestamp in seconds (and some others). This is basically to represent a time series, but with an irregular time resolution spacing. Now I'd like to extract rows such that the spacing is at least 5 seconds as I did in my example below. But I was wondering whether there is a more vectorized way to do this.

Is there a more elegant way that works without resorting to this rather verbose loop?

It doesn't matter if there is an offset at the start, and the 5 seconds are just an arbitrary number.

import pandas as pd
import numpy as np
N = 100
time = np.arange(0, N, 2)
time = time + np.random.random(len(time))

df = pd.DataFrame(time, columns=('time',))  # assume df has more than one columnj
print(df)


last = 0
mask = []
for i in range(len(df)):
    if df['time'][i] > last + 5:  # find first entry after at least 5 seconds
        last = df['time'][i]
        mask.append(True)
    else:
        mask.append(False)
print(df.loc[mask])
flawr
  • 10,814
  • 3
  • 41
  • 71

1 Answers1

0

numba may give you a better performance, even if it is not vectorized way.

See this answer from @jpp for somewhat similar problem.

Ilya Berdichevsky
  • 1,249
  • 10
  • 24