1

i have two dataframe like this.

df1
MainId,Time,info1,info2
100,2018-07-12 08:05:00,a,b
100,2018-07-12 08:07:00,x,y
101,2018-07-14 16:00,c,d
100,2018-07-14 19:30:00,d,e
104,2018-07-14 03:30:00,g,h

and

df2
Id,MainId,startTime,endTime,value
1,100,2018-07-12 08:00:00,2018-07-12 08:10:00,1001
2,150,2018-07-14 10:05:00,2018-07-14 17:05:00,1002
3,101,2018-07-12 0:05:00,2018-07-12 19:05:00,1003
4,100,2018-07-12 08:05:00,2018-07-12 08:15:00,1004

df2 is main dataframe and df1 is subdataframe. I would like to check starttime and endtime of df2 with the time in df1 with respective to MainId. If df1.Time isin df2(start and endtime) with respective to MainId, then i want to include info1 and info2 column of df1 to df2. If there are no values, then I would like to enter just nan.

I want my output like this

Id,MainId,info1,info2,value
1,100,a,b,1001
1,100,x,y,1001
2,150,nan,nan,1002
3,101,nan,nan,1003
4,100,a,b,1004
4,100,x,y,1004

Here I have two same Id(In Id1) and MainId in output because they have different info1 and info2 and I want to include that one too.

This is what I am doing in pandas

df2['info1'] = np.where((df2['MainId'] == df1['MainId'])& (df1['Time'].isin([df2['startTime'], df2['endTime']])),df1['info1'], np.nan)

but it is throwing an error

ValueError: Can only compare identically-labeled Series objects

How Can i Fix this error ? Is there a better way ?

user96564
  • 1,578
  • 5
  • 24
  • 42
  • 1
    You should merge `on=MainId` and then use a boolean mask afterwards to find where the time is between. It's very similar to [this answer](https://stackoverflow.com/questions/51755268/pandas-merge-on-datetime-or-datetime-in-datetimeindex), though in this case you just merge based on `MainId` – ALollz Nov 12 '18 at 14:43
  • @ALollz I did trying first with `merge` but problem I found is that, after merge, while selecting the data when the Time isin (start and end), it is possible that some of the original data from `df2` were missing. I need to return all the data from `df2` ( expected output from question) – user96564 Nov 12 '18 at 18:22

1 Answers1

0

df1 and df2 have diferente Index (you can check this by inspecting df1.index and df2.index. Hence, when you do df2['MainId'] == df1['MainId'], you have 2 series objects that are not comparable.

Try using a left join, something like:

df3 = df2.join(df1.set_index('MainId'), on='MainId'))

should give you the dataframe you want. You can then use it to execute your comparisons.

Daniel Severo
  • 1,768
  • 2
  • 15
  • 22