I have a dataframe, df_A with two columns 'amin' and 'amax', which is a set of time range.
My objective is to find whether a column in df_B lies between any of the rows of range in df_A 'amin' and 'amax' columns.
df_A[['amin','amax'] ]
amin amax
0 2016-07-16 19:37:03 2016-07-17 11:16:32
1 2016-07-04 21:15:54 2016-07-05 10:57:46
2 2016-07-24 23:30:41 2016-07-25 15:38:02
3 2016-07-12 03:02:38 2016-07-12 22:11:01
df_B['created_date']
created_date
2016-07-17 01:16:32
2016-07-05 10:15:54
2016-07-12 12:11:01
df_A['amin'] = pd.to_datetime(df_A['amin'], errors='coerce')
df_A['amax'] = pd.to_datetime(df_A['amax'], errors='coerce')
df_B['created_date'] = pd.to_datetime(df_B['created_date'],errors='coerce')
def dt2epoch(value):
epoch = (value - pd.to_datetime(datetime(2015,12,31).strftime('%Y-%m-%d %H:%M:%S.%f'))).total_seconds()
return epoch
df_A['amax_epoch']=df_A['amax'].apply(dt2epoch)
df_A['amin_epoch']=df_A['amin'].apply(dt2epoch)
df_B['created_date_epoch']=df_B['created_date'].apply(dt2epoch)
def make_tuple(row):
n= len(row)
row = [(x,row[n - 1]) for x in row]
return row
minMaxTuple = minMax.apply(make_tuple, axis =1)
Above is part of my code, I've tried below(not sure if it's necessary):
- convert them to epoch values
- convert df_A into a tuple.
However, df_A and df_B has a different number of rows. Also, I dont have any id column to merge them together.
label = []
for l in df_B['created_date_epoch']:
if (m[0] for m in minMaxTuple) <= l <= (m[1] for m in minMaxTuple):
label.append('1')
else:
label.append('0')
However, when I run this, the result I get for 'label' is an empty list.
Also, the label should be a column that has the same number of rows as df_A.
Ultimately, I would like to add a new 'label' column in df_A:
minMaxTuple label
(2016-07-16 19:37:03, 2016-07-17 11:16:32) 1
(2016-07-04 21:15:54, 2016-07-05 10:57:46) 1
(2016-07-24 23:30:41, 2016-07-25 15:38:02) 0
(2016-07-12 03:02:38, 2016-07-12 22:11:01) 1