0

new here

I have the following issue. I have 4 df-s.

  1. main_df - has 5 columns, interested in first 3 (Lease, Opportunity, Archetype) and 300000 rows
  2. lease_df - has Lease as first column, then has 10 columns corresponding to hours and 15 rows
  3. opportunity_df - has Opportunity as first column, then has 10 columns corresponding to hours and 160 rows
  4. archetype_df - has Archetype as first column, then has 10 columns corresponding to hours and 20 rows

For each row in main_df, I want to find the most granular available hourly data from the 3 other tables and merge to main_df. It's important to note that for any given row in tables 2-4 all 10 columns can either all be null or all non-null. archetype_df will not have any nulls, so would be last resort to be merged.

So I want first merge the lease_df on Lease, then iterate over all the rows and see if the 10 columns are null or not. If null, then for that row merge opportunity_df on Opportunity and look if the 10 columns are null, if yes then merge archetype_df on Archetype.

The code below seems to be working but takes >3-4 hours to run since I have 300,000 rows which might increase in the future. Can you help me find an optimized version that would run much faster?

lease_df

Lease 1 2 3 4 5 6 7 8 9 10
lease1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
...

opportunity_df

Opportunity 1 2 3 4 5 6 7 8 9 10
opp1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
...

archetype_df

Archetype 1 2 3 4 5 6 7 8 9 10
arch1 1 1.4 2 2 2 2 2 2 1.4 1
...

main_df_3 = pd.DataFrame()

main_df_1 = pd.merge(main_df, lease_df , on='Lease', how='left')
for i in range(len(main_df_1 )):
    if main_df_1 .iloc[i, 5:].isnull().all():
        main_df_2 = main_df_1 .iloc[i:i+1, :5]
    main_df_2 = pd.merge(main_df_2 , opportunity_df, on='Opportunity', how='left')
    if main_df_2 .iloc[0, 5:].isnull().all():
        main_df_2 = main_df_2 .iloc[0:1, :5]
    main_df_2 = pd.merge(main_df_2 , archetype_df, on='Archetype', how='left')
    main_df_3 = pd.concat([main_df_3 , main_df_2 ])

taslan01
  • 1
  • 2
  • 1
    Please **always** add a [MRE](https://stackoverflow.com/help/minimal-reproducible-example) (also look [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)). Since you seem to have a lot of columns just provide the code to produce a sample or provide a sample with less columns. – Timus Apr 12 '23 at 11:52

0 Answers0