1

I'm trying to check if the difference between two Timestamp columns in Pandas is greater than n seconds. I don't actually care about the difference. I just want to know if it's greater than n seconds, and I could also limit n to a range between, let's say, 1 to 60.

Sounds easy, right?

This question has many valuable answers outlining how to do that.

The problem: For reasons outside of my control, the difference between the two timestamps may be quite large, and that's why I'm running into an integer overflow.

Here's a MCVE:

import pandas as pd
import pandas.testing


dataframe = pd.DataFrame(
    {
        "historic": [pd.Timestamp("1900-01-01T00:00:00+00:00")],
        "futuristic": [pd.Timestamp("2200-01-01T00:00:00+00:00")],
    }
)

# Goal: Figure out if the difference between
#       futuristic and historic is > n seconds, i.e.:
#       futuristic - historic > n

number_of_seconds = 1

dataframe["diff_greater_n"] = (
    dataframe["futuristic"] - dataframe["historic"]
) / pd.Timedelta(seconds=1) > number_of_seconds

expected_dataframe = pd.DataFrame(
    {
        "historic": [pd.Timestamp("1900-01-01T00:00:00+00:00")],
        "futuristic": [pd.Timestamp("2200-01-01T00:00:00+00:00")],
        "diff_greater_n": [True],
    }
)

pandas.testing.assert_frame_equal(dataframe, expected_dataframe)

Error:

OverflowError: Overflow in int64 addition

A bit more context:

  • The timestamps need to have second precision, i.e. I don't care about any milliseconds
  • This is one of multiple or-combined checks on the dataframe
  • The dataframe may have a few million rows
  • I'm quite happy that I get to finally ask about an Overflow error on stackoverflow
Maurice
  • 11,482
  • 2
  • 25
  • 45

1 Answers1

1

One option may be to use datetime:

import datetime as dt

...

dataframe["diff_greater_n"] = (
    dataframe["futuristic"].dt.to_pydatetime() 
    - dataframe["historic"].dt.to_pydatetime()
) / dt.timedelta(seconds=1) > number_of_seconds
BigBen
  • 46,229
  • 7
  • 24
  • 40
  • 1
    I had played around with `datetime` but somehow never tried the equivalent of the original implementation - thanks! – Maurice Mar 07 '23 at 06:42