0

I have a panda dataset with every line timestamped (unix time - every line represents a day).

Ex:

Index  Timestamp  Value
1      1544400000  2598
2      1544572800  2649
3      1544659200  2234
4      1544745600  2204
5      1544832000  1293

Is it possible to use a method in which I can subtract every row (from first column) from the previous row? The purpose is to know if the interval between lines is the same, to make sure that the dataset isn't skipping a day. In the example above, the first day skips to the third day, giving a 48hrs interval, while the other rows are all 24hrs interval.

I think i could do it using iterrows(), but that seems very costly for large databases.

--

Not sure I was clear enough so, in the example above:

Column Timestamp:

Row 2 - row 1 = 172800 (48hrs)

Row 3 - row 2 = 86400 (24hs)

Row 4 - row 3 = 86400 (24hrs) ...

Community
  • 1
  • 1
Justcurious
  • 1,952
  • 4
  • 11
  • 17

1 Answers1

3

Pandas DataFrames have a diff method that does what you want. Note that the first row of the returned diff will contain NaNs, so you'll want to ignore that in any comparison.

An example would be

import pandas as pd

df = pd.DataFrame({'timestamps': [100, 200, 300, 500]})

# get diff of column (ignoring the first NaN values) and convert to a list
X = df['timestamps'].diff()[1:].tolist()
X.count(X[0]) == len(X)  # check if all values are the same, e.g. https://stackoverflow.com/a/3844948/1862861
Matt Pitkin
  • 3,989
  • 1
  • 18
  • 32