0

I follow this thread which doesn't fully answer my solution : Best way to join / merge by range in pandas

I precise that my min/max in my case are DateTime but this works without any problem.

The accepted answer works fine for me, however I think this could be optimized for my use case.

Indeed, it will create a mega datframe which concatenates both A and B df, whereas I would need only the rows where A_id and B_id match exactly.

My original df has 79k rows. After processing this

C = pd.DataFrame(
    np.column_stack([A.values[i], B.values[j]]),
    columns=A.columns.append(B.columns)
)

I have a df of 2.3 m rows, which when I keep only C[C['A_id'] == C['B_id']] goes back to 74k rows which is what I expect.

How can I do it directly in the first operation, improving at the same time the time to process ?

Thanks

yeye
  • 503
  • 4
  • 19
  • Checkout [Merge two data frames based on common column values in Pandas](https://stackoverflow.com/questions/43297589/merge-two-data-frames-based-on-common-column-values-in-pandas) @yeye – Vishnudev Krishnadas Jun 15 '19 at 07:33
  • Yes but the thing is that with the need of comparing the min/max value between the 2 df, the best solution in terms of performance seems to be using this method. I just need to tweak it so that it only retrieves me the matching rows only, instead of all. If I do if after I think this is not the most optimized way of doing it – yeye Jun 15 '19 at 07:41
  • Merge two dataframe using `left_on='A_id'` and `right_on='B_id'` and then check for min/max of datetime . – Vishnudev Krishnadas Jun 15 '19 at 07:43
  • Doing like this is too long to process – yeye Jun 15 '19 at 08:50

1 Answers1

0

I found my way.

I added Series, and added the condition in the i, j = np.where((a[:, None] >= bl) & (a[:, None] <= bh)) of the related topic

Now I have:

a = A.A_value.values
aId = A.A_id.values
bId = B.B_id.values
bh = B.B_high.values
bl = B.B_low.values

i, j = np.where((a[:, None] >= bl) & (a[:, None] <= bh) & (aId[:, None] == bId)

This is almost instantaneous for my 80k lines whereas before it took 3 seconds

yeye
  • 503
  • 4
  • 19