I am running into the following issue. I want to merge two dataframes on multiple columns(11 to be exact). Surprisingly the usual methods do not work. Example dataframes are as follows: df1:
c1 c2 c3 c4 c5 c6 event_count
1 2 a ff 0 1 5
1 2 b fg 1 3 6
1 2 c hg 2 4 20
1 2 d gf 0 1 7
1 2 e fg 1 4 1
df2:
c1 c2 c3 c4 c5 c6 event_type event_price
1 2 a ff 0 1 a 20
1 2 b fg 1 3 a 20
1 2 c hg 2 4 b 30
1 2 d gf 0 1 b 40
1 2 e fg 1 4 b 50
I tried
result = pd.merge(df1, df2, how='outer', on = ['c1','c2','c3','c4','c5','c6'])
and also this should technically work:
result = pd.merge(df1, df2, how='outer')
But I get this:
c1 c2 c3 c4 c5 c6 event_count event_type event_price
1 2 a ff 0 1 5
1 2 b fg 1 3 6
1 2 c hg 2 4 20
1 2 d gf 0 1 7
1 2 e fg 1 4 1
1 2 a ff 0 1 a 20
1 2 b fg 1 3 a 20
1 2 c hg 2 4 b 30
1 2 d gf 0 1 b 40
1 2 e fg 1 4 b 50
When I want to get this:
c1 c2 c3 c4 c5 c6 event_count event_type event_price
1 2 a ff 0 1 5 a 20
1 2 b fg 1 3 6 a 20
1 2 c hg 2 4 20 b 30
1 2 d gf 0 1 7 b 40
1 2 e fg 1 4 1 b 50
I also tried with indexes:
df1_index = df1.set_index(['c1', 'c2','c3'....]); df2_index =df2.set_index(['c1', 'c2','c3'....]); result = pd.concat([df1_index, df2_index], axis=1);
But that gave the same wrong result. I made sure the dtypes of the two tables are the same as well. I am not sure what else to try. Any advice? Thanks :)