I have the following dataframe (df_inventory) of unique cars that has around 3000 rows:
stock_id car_brand car_model age_group mileage_group
AS12354 Mazda CX-7 6-10 50-100k
DX22234 Toyota Verso 11> 150k>
KL32423 Volkswagen Touran 11> 150k>
SU12121 Renault Twingo 6-10 50-100k
... ... ... ... ...
and then I have the following dataframe (df_main) of (unique) cars that has around 100000 rows:
clientID car_brand car_model age_group mileage_group
23132 Volkswagen Golf 6-10 50-100k
24234 Renault Twingo 11> 150k>
19055 Volkswagen Polo 11> 150k>
23245 Renault Clio 1-2 0-50k
... ... ... ... ...
What I would like to get at the end is a dataframe with two columns. One will be the clientID and the second one will be the Stock_id IF there is a hard match on all other columns (car_brand, car_model, age_group, mileage_group).
This can be achieved using two for loops, iterating every combination between stock_id and clientID. But that leads to 3000*100000 iterations and it takes so much time:
clientID=[]
stock_id=[]
for i in range(df_main.shape[0]):
for j in range(df_inventory.shape[0]):
if (df_main.iloc[i,1]==df_inventory.iloc[j,1]
and df_main.iloc[i,2]==df_inventory.iloc[j,2]
and df_main.iloc[i,3]==df_inventory.iloc[j,3]
and df_main.iloc[i,4]==df_inventory.iloc[j,4] ):
clientID.append(df_main.iloc[i,0])
stock_id.append(df_inventory.iloc[j,0])
Is there any pythonic way (list comprehension) for example that could make this process faster?