I have two Pandas dataframes, one quite large (30000+ rows) and one a lot smaller (100+ rows).
The dfA looks something like:
X Y ONSET_TIME COLOUR
0 104 78 1083 6
1 172 78 1083 16
2 240 78 1083 15
3 308 78 1083 8
4 376 78 1083 8
5 444 78 1083 14
6 512 78 1083 14
... ... ... ... ...
The dfB looks something like:
TIME X Y
0 7 512 350
1 1722 512 214
2 1906 376 214
3 2095 376 146
4 2234 308 78
5 2406 172 146
... ... ... ...
What I want to do is for every row in dfB to find the row in dfA where the values of the X AND Y columns are equal AND that is the first row where the value of dfB['TIME'] is greater than dfA['ONSET_TIME'] and return the value of dfA['COLOUR'] for this row.
dfA represents refreshes of a display, where X and Y are coordinates of items on the display and so repeat themselves for every different ONSET_TIME (there are 108 pairs of coodinates for each value of ONSET_TIME).
There will be multiple rows where the X and Y in the two dataframes are equal, but I need the one that matches the time too.
I have done this using for loops and if statements just to see that it could be done, but obviously given the size of the dataframes this takes a very long time.
for s in range(0, len(dfA)):
for r in range(0, len(dfB)):
if (dfB.iloc[r,1] == dfA.iloc[s,0]) and (dfB.iloc[r,2] == dfA.iloc[s,1]) and (dfA.iloc[s,2] <= dfB.iloc[r,0] < dfA.iloc[s+108,2]):
return dfA.iloc[s,3]