i have two dataframe like this.
df1
MainId,Time,info1,info2
100,2018-07-12 08:05:00,a,b
100,2018-07-12 08:07:00,x,y
101,2018-07-14 16:00,c,d
100,2018-07-14 19:30:00,d,e
104,2018-07-14 03:30:00,g,h
and
df2
Id,MainId,startTime,endTime,value
1,100,2018-07-12 08:00:00,2018-07-12 08:10:00,1001
2,150,2018-07-14 10:05:00,2018-07-14 17:05:00,1002
3,101,2018-07-12 0:05:00,2018-07-12 19:05:00,1003
4,100,2018-07-12 08:05:00,2018-07-12 08:15:00,1004
df2
is main dataframe and df1
is subdataframe. I would like to check starttime
and endtime
of df2
with the time in df1
with respective to MainId
. If df1.Time isin df2(start and endtime)
with respective to MainId
, then i want to include info1
and info2
column of df1 to df2. If there are no values, then I would like to enter just nan.
I want my output like this
Id,MainId,info1,info2,value
1,100,a,b,1001
1,100,x,y,1001
2,150,nan,nan,1002
3,101,nan,nan,1003
4,100,a,b,1004
4,100,x,y,1004
Here I have two same Id(In Id1) and MainId in output because they have different info1 and info2 and I want to include that one too.
This is what I am doing in pandas
df2['info1'] = np.where((df2['MainId'] == df1['MainId'])& (df1['Time'].isin([df2['startTime'], df2['endTime']])),df1['info1'], np.nan)
but it is throwing an error
ValueError: Can only compare identically-labeled Series objects
How Can i Fix this error ? Is there a better way ?