How to compare and match beginning and end of two ranges of timestamps in two different dataframes, when the frequency of timestamps varies, and it is not known which range starts earlies and finishes later. Then discard unmatched beginning and end, so the two ranges are the same. Easy to do it manually in a txt file, how to do it in python and pandas dataframes?
Sample first dataframe:
0 1
0 2022-10-30 14:11:57
1 2022-10-30 14:11:57
2 2022-10-30 14:11:57
3 2022-10-30 14:11:58
4 2022-10-30 14:11:59
... ...
149801 2022-10-30 15:22:11
149802 2022-10-30 15:22:11
149803 2022-10-30 15:22:11
149804 2022-10-30 15:22:11
149805 2022-10-30 15:22:11
\[149806 rows x 2 columns\]
Sample second dataframe:
0 1
0 2022-10-30 14:11:59
1 2022-10-30 14:11:59
2 2022-10-30 14:12:00
3 2022-10-30 14:12:00
4 2022-10-30 14:12:00
... ...
21065 2022-10-30 15:22:11
21066 2022-10-30 15:22:11
21067 2022-10-30 15:22:12
21068 2022-10-30 15:22:13
21069 2022-10-30 15:22:13
Column 1 filled with data
Comparing two timestamps in a specific row would look like:
if first_df[0].iloc[0] == second_df[0].iloc[0]:
print('hit')
else:
print('miss')
How to do it over full range, so it would be possible to discard unmatched beginning and end while preserving what's inside?
Sample match of those two ranges: First dataframe:
0 1
4 2022-10-30 14:11:59
... ...
149801 2022-10-30 15:22:11
149802 2022-10-30 15:22:11
149803 2022-10-30 15:22:11
149804 2022-10-30 15:22:11
149805 2022-10-30 15:22:11
Second dataframe:
0 1
0 2022-10-30 14:11:59
1 2022-10-30 14:11:59
2 2022-10-30 14:12:00
3 2022-10-30 14:12:00
4 2022-10-30 14:12:00
... ...
21065 2022-10-30 15:22:11
21066 2022-10-30 15:22:11
Edit:
Consider this code (note that frequency of timestamps in each dataframe is different):
import pandas as pd
from datetime import datetime
df1 = pd.DataFrame({'val_1' : [10,11,12,13,14,15]},
index = [pd.DatetimeIndex([datetime.strptime(s, '%Y-%m-%d %H:%M:%S')])[0]
for s in ['2022-11-12 09:03:59',
'2022-11-12 09:03:59',
'2022-11-12 09:03:59',
'2022-11-12 09:04:00',
'2022-11-12 09:04:01',
'2022-11-12 09:04:02'
] ])
df2 = pd.DataFrame({'val_2': [11,22,33,44]},
index = [pd.DatetimeIndex([datetime.strptime(s, '%Y-%m-%d %H:%M:%S')])[0]
for s in ['2022-11-12 09:03:58',
'2022-11-12 09:03:59',
'2022-11-12 09:03:59',
'2022-11-12 09:04:00',
] ])
What I would like as result is this:
val_1 val_2
2022-11-12 09:03:59 10 NaN
2022-11-12 09:03:59 11 22
2022-11-12 09:03:59 12 33
2022-11-12 09:04:00 13 44
or: df1:
2022-11-12 09:03:59 10
2022-11-12 09:03:59 11
2022-11-12 09:03:59 12
2022-11-12 09:04:00 13
and df2:
2022-11-12 09:03:59 22
2022-11-12 09:03:59 33
2022-11-12 09:04:00 44
Tried both join and merge with probably every combination of options and can't do that.