I have a dataframe (df01) organized as :
Index | colA | colB |
-------------------------------
--- | aaa_01 | bbb_01 |
--- | aaa_02 | bbb_02 |
--- | aaa_01 | bbb_01 |
--- | aaa_01 | bbb_01 |
--- | aaa_02 | bbb_02 |
--- | aaa_03 | bbb_03 |
--- | aaa_04 | bbb_04 |
--- | aaa_02 | bbb_02 |
--- | aaa_02 | bbb_02 |
--- | aaa_01 | bbb_01 |
and another dataframe (df02), with two columns corresponding to colA and colB from the first database, and an additional column
Index | colA | colB | colC
---------------------------------------
--- | aaa_01 | bbb_01 | ccc_01
--- | aaa_02 | bbb_02 | ccc_02
--- | aaa_03 | bbb_03 | ccc_03
--- | aaa_03 | bbb_03 | ccc_03
For each row of the first database, I have to find the matching row in the second database (based on the values of colA and colB), and add the value of colC in the first database...basically, I want to end up with:
Index | colA | colB | colC
---------------------------------------
--- | aaa_01 | bbb_01 | ccc_01
--- | aaa_02 | bbb_02 | ccc_02
--- | aaa_01 | bbb_01 | ccc_01
--- | aaa_01 | bbb_01 | ccc_01
--- | aaa_02 | bbb_02 | ccc_02
--- | aaa_03 | bbb_03 | ccc_03
--- | aaa_04 | bbb_04 | ccc_04
--- | aaa_02 | bbb_02 | ccc_02
--- | aaa_02 | bbb_02 | ccc_02
--- | aaa_01 | bbb_01 | ccc_01
So far, I tried something like
C_list = []
for row in range(df_01.shape[0]):
a = df_01.iloc[row, 1]
b = df_01.iloc[row, 2]
c = df_02[(df_02['colA']==a) & (df_02['colB']==b)]
C_list.append(c)
df_01['colC'] = np.array(C_list)
but I got some errors, plus it's really not very pythonic...I'm sure there's an easier way to do it?