1

I have two data that looks like below

saved_info = [datetime.datetime(2023, 7, 4, 23, 18, 22, 113476), 't55643', 'ab$ff$55'),
              datetime.datetime(2023, 7, 4, 23, 26, 22, 113476), 't55643', '5b$ff$15'),
              datetime.datetime(2023, 7, 4, 23, 27, 22, 133470), 't55643', 'ab$ff$55')
           ]

new_info = [('t55643', 'ab$ff$55', 44),
            ('t55643', 'be$qf$34', 33)
           ]

I load them into pandas as follows

df1 = pd.DataFrame(new_info)
df1.columns = ["tid", "cid", "val"]

df2 = pd.DataFrame(saved_info)
df2.columns = ["modified_at", "tid", "cid"]

So the data frames look like below

df1

      tid       cid  val
0  t55643  ab$ff$55   44
1  t55643  be$qf$34   33

df2

                                         modified_at     tid       cid
0  datetime.datetime(2023, 7, 4, 23, 18, 22, 113476)  t55643  ab$ff$55
1  datetime.datetime(2023, 7, 4, 23, 26, 22, 112471)  t55643  5b$ff$15
2  datetime.datetime(2023, 7, 4, 23, 27, 22, 133470)  t55643  ab$ff$55

Now I want to get rows from df1 that have common cid value with df2 and modified_at value of df2 should be greater than 15mins

So lets say datetime right now is 2023-07-04 23:36:38 So accordingly the final result of df1 should be

df1 (final)

      tid       cid  val
0  t55643  ab$ff$55   44                               

As you can see the cid value of first row of df1 matches with the first row of df2 and also the time diff of modified_at value of first row of df2 with current time is greater than 15 mins.

Now I can get rows of df1 that share common value on cid column with df2 by doing something like below

common = df1.merge(df2, on=['cid'])
df1_final = df1[(df1.cid.isin(common.cid))]

For comparing time diff between rows of two data frames, I found a stackoverflow answer https://stackoverflow.com/a/46966942/5550284

But in my case I need to check a column value against the current UTC time and furthermore I don't know how do I chain these two conditions together.

Can someone help me?

Souvik Ray
  • 2,899
  • 5
  • 38
  • 70

1 Answers1

2

You don't need merge here, just keep rows where diff is greater than 15 minutes:

current_time = datetime.datetime(2023, 7, 4, 23, 36, 38)
cond = current_time - df2['modified_at'] > '15m'

out = df1[df1['cid'].isin(df2.loc[cond, 'cid'])]

Output:

>>> out
      tid       cid  val
0  t55643  ab$ff$55   44

>>> cond
0     True
1    False
2    False
Name: modified_at, dtype: bool
Corralien
  • 109,409
  • 8
  • 28
  • 52