Matching Pandas DataFrame Column Values with another DataFrame Column

Question

country = []
for i in df_temp['Customer Name'].iloc[:]:
    if i in gui_broker['EXACT_DDI_CUSTOMER_NAME'].tolist():
        country.append(gui_broker["Book"].values[gui_broker['EXACT_DDI_CUSTOMER_NAME'].tolist().index(i)])
    else:
        country.append("No Book Defined")
df_temp["Country"] = country

I have currently a large DataFrame (df_temp) with one column ('Customer Name') and am trying to match it with a small DataFrame (gui_broker) which has 3 columns - one of which has all unique values of the large DataFrame ('EXACT_DDI_CUSTOMER_NAME').

After matching the value row of df_temp I want to create a new column in df_temp with the value 'Book' of my small DataFrame (gui_broker) based on the matching. I tried every apply lambda method, but am out of clue. The above provided code provides me with a solution, but it's slow and not Pandas like...

How exactly could I proceed?

Could you provide [mcve]? It's much easier to help you that way. — zipa, Dec 05 '18 at 16:12

score 3 · Answer 1 · answered Dec 05 '18 at 16:24

You can use pandas merge to do that. like this...

df_temp = df_temp.merge(gui_broker[['EXACT_DDI_CUSTOMER_NAME','Book']], left_on='Customer Name', right_on='EXACT_DDI_CUSTOMER_NAME', how='left' )
df_temp['Book'] = df_temp['Book'].fillna('No Book Defined')

score 1 · Answer 2 · answered Dec 05 '18 at 16:15

1

I believe this should do it, using map to map the Book column of gui_broker by the EXACT_DDI_CUSTOMER_NAME, onto Custome Name in df_tmp, :

df_tmp['Country'] = (df_tmp['Customer Name']
                     .map(gui_broker.set_index('EXACT_DDI_CUSTOMER_NAME').Book)
                     .fillna('No Book Defined'))

Though I would need some example data to test it with!

answered Dec 05 '18 at 16:15

sacuL

49,704
8
81
106

Thanks. I've tried to do that but I get InvalidIndexError: Reindexing only valid with uniquely valued Index objects. df_tmp['Customer Name'] has many non unique rows. gui_broker('EXACT_DDI_CUSTOMER_NAME') however just one. – Bython Dec 05 '18 at 16:21
1

If you're getting that error, I think it would be that `gui_broker` actually *does* have non-unique rows, in which case you could drop the duplicates of that one first. It should not matter that `df_tmp` has duplicates, only duplicates in the mapping series matter. If that's the case, use `gui_broker.drop_duplicates(subset='EXACT_DDI_CUSTOMER_NAME',inplace=True)` before. – sacuL Dec 05 '18 at 16:23
1

Trying it with a dummy dataframe, I get that error message when there is a duplicate in `gui_broker`, but not otherwise – sacuL Dec 05 '18 at 16:26

score 1 · Accepted Answer · answered Dec 05 '18 at 16:35

Looks like you are looking for join (docs are here) It joins DataFrame with the other by matching the selected column(s) in the first with the index in the second.

So

df_temp.join(gui_broker
             .set_index("EXACT_DDI_CUSTOMER_NAME")
             .loc[:, ["Book"]],
             on="Customer Name")

Matching Pandas DataFrame Column Values with another DataFrame Column

3 Answers3

Linked